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A METHOD FOR SEARCHING HETEROGENEOUS 
COMPOUND DATABASES USING TOPOMERIC SHAPE 
DESCRIPTORS AND PHARMACOPHORIC FEATURES 

A portion of the disclosure of this patent document contains material which is subject to 
copyright protection. The copyright owner has no objection to the facsimile reproduction by 
anyone of the patent document or the patent disclosure, as it appears in the Patent and 
Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. 

BACKGROUND OF THE INVENTION 

Field of the Invention: 

This invention relates generally to the field of pharmaceutical research and to the three 
dimensional searching of structures of chemical compounds to identify compounds which may 
share a biological activity with a known compound. In particular the invention concerns a 
method for searching databases of commercially available compounds which may or may not 
share any common synthetic linage. 
Description of Related Art: 

The advent of high throughput screening of chemical compounds for biological activity 
has dramatically changed the paradigm of pharmaceutical research in recent years. Coupled with 
combinatorial synthesis, it is now possible to test millions of compounds on an efficient basis. 
However, the cost per hit of such searching remains extremely high given the enormous number 
of compounds which can be tested and the typically low "hit" rates which are achieved. As a 
result, greater emphasis has been placed on the testing of compound libraries which are believed 
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to contain a higher percentage of potentially relevant molecules. The skills of computational 
chemists have been employed to design such compound libraries for testing. 

Two type of libraries were considered possible: first, a library which explored the 
diversity of structures in chemical space across the range of compounds which could be 
5 synthesized without oversampling the same area of diversity space (redundant testing); and 
second, a library in which the compounds would be likely to have the same biological activity 
as a known molecule or drug. The major problem confronting computational chemists in the 
selection of compounds for such libraries was how to characterize the compounds in a manner 
which would permit the desired selections. Bioscientists have long known that the three 
10 £3 dimensional shape of a compound which acts as a ligand to a larger biomolecule must be 
Q] complimentary to the shape of the binding site of the larger biomolecule. In studying the 
111 relationships between the chemical structure of a molecule and its biological activity (structure 
X activity relationships (SAR) many techniques to characterize the three dimensional shape of 
k molecules were devised. One of the most successful of the techniques for generating a 
15q quantitative structure activity relationship (QSAR) characterized the shape of molecules by 
0 defining an interaction energy field between a probe molecule and each part of the studied 
molecule in a three dimensional grid surrounding the molecule. The shape data thus generated 
for a series of molecules could be correlated with the biological activity of the molecules to 
produce the QSAR. This technique by Cramer and Wold (Comparative Molecular Field Analysis 
20 [CoMFA]) is described in detail in U.S. Patent No. 5,025,388 and U.S. Patent No. 5,307,287. 

Use of the CoMFA approach required detailed considerations of two major factors: 1) 
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the proper alignment of the test molecules; and 2) the conformation or conformations of the 
molecules which had to be taken into account. In addition, the technique worked only with 
molecules sharing the same biological activity. However, the technique clearly demonstrated the 
power of utilizing three dimensional shape descriptors in molecular analysis. 
5 Over time many three dimensional shape descriptors and methods of library selection 

were attempted by computational chemists. U.S. Patent No. 5,703,792 to Chapman describes 
one such approach. Two major problems confronted the field and cast doubt on the generality 
or accuracy of all the methods which had been devised. The first problem was that no one could 
show that the molecular structural descriptors which had been used were generally valid; that 
lQp is, that the descriptors described molecules in a manner which correlated with biological activity 
m across a range of biological systems. Any descriptor which would be used to select compounds 
U] for libraries would have to be valid irrespective of the biological activity which might be tested 
J: against the library. The second problem was that there was likewise no way to demonstrate that 
q the methods of handling multiple conformations in the prior art methods were either accurate 
15p or applicable across all types of molecules. 
C3 The solution to these problems by Cramer, Patterson, Clark, and Ferguson are taught in 

U.S. Patent No. 6,185,506. The validity of a molecular structural descriptor can be 
demonstrated across multiple biological activities by employing the Patterson plot methodology 
described in the patent. Both two and three dimensional descriptors can be evaluated by the 
20 methodology, and, in principal, there is no limitation on the dimensionality of the descriptors 
which can be evaluated. Using the validation technique, valid descriptors were identified which 
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could be used with assurance to design libraries having desired properties. By this method the 
two dimensional prior art fingerprint Tanimoto descriptor was shown to be valid as well as a 
new three dimensional descriptor described below. The validation methodology also identified 
a neighborhood distance characteristic of the descriptors which could be used in the design of 
the libraries. In addition, the neighborhood distance led directly to methods for searching the 
libraries, and, once a molecule had shown activity in a screen, for expanding the search for 
other molecules having the same activity. 

Further, a solution to the problem of identifying a generally appropriate molecular 
conformation or conformations to take into account was taught. An alignment rule for molecular 
parts (topomeric alignment) is demonstrated which generates a uniform orientation. The shape 
of the molecular part is characterized, as in CoMFA, by a field of interaction energies calculated 
between a probe and the atoms in the aligned molecular part at each point in a three dimensional 
grid surrounding the molecular part. The steric interaction energies are principally used 
although, in the appropriate circumstances, electrostatic interaction energies may be added. 
Although the alignment may be arbitrary and unlikely for any particular molecule, the field 
shape descriptor of the topomeric alignments was shown to be a valid molecular structural 
descriptor by means of the Patterson plot method. 

Using descriptors having an associated neighborhood distance, molecules could be 
identified which shared shape characteristics in a way which was meaningfully related to their 
biological activity. The problems of efficient library design and selection of combinatorially 
accessible molecules could be further addressed. In U.S. Patent Application No. 08/903,217, 
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presently allowed, the construction and searching of a virtual library is described. The virtual 
library contains validated molecular structural descriptions of each component part which could 
be used in a specified combinatorial synthesis. All possible product molecules which could be 
combinatorially derived from the component parts can be searched, without the necessity of 
5 generating the product structures during the search, for product molecules having desired 
properties by searching through only a combination of the descriptors of the component parts 
of the product molecules. In the preferred embodiment the Tanimoto and the three dimensional 
topomeric CoMFA descriptors are employed. 

Due to the combinatorial nature of the number of product molecules whose characteristics 
1Q^ ; can be determined, a relatively small number of structural variations (tens of thousands), cores, 
[5; and synthetic schemes employing only two attachment points can yield a searchable library of 

W billions of possible molecules according to the method of the patent. Indeed, the number of 

-"I 

+~ searchable molecules outnumbers the number of molecules ever reported by several orders of 
JL magnitude. By the techniques disclosed in the patent, this virtual library can be searched very 
lft fast to construct diverse libraries of molecules likely to share the same biological activity or to 
□ find molecules which share the same biological activity as a combinatorially derived query 
molecule. Further, query molecules which derive from unknown synthetic routes can be 
fragmented and the molecular descriptor characterization of the fragments used to search for 
similarly shaped fragments and potential molecules with likely similar biological activity defined 
20 in the virtual library. In practice the topomeric field molecular structural descriptor has proven 
to be very valuable in searching the virtual library. The powerful and fast searching capabilities 
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of the virtual library method have yielded significant advances. 

However, the molecules in the virtual library which can be searched by definition derive 
from a combinatorial assembly of a relatively few number of constituent parts and can be said 
to be homogeneous in that sense. By virtue of the exceedingly large size of the virtual library, 
molecules may be identified which are not readily available. Also, although the possible product 
molecules which can be searched are the result of known combinatorial synthetic schemes, the 
actual synthesis may not be easily achieved. In the day to day world of pharmaceutical research, 
large assemblages of available molecules can be commercially obtained. These assemblages are 
not the result of any particular combinatorial synthesis but rather represent the assembly of a 
wide range of molecules from many different sources and syntheses, some known, some 
unknown. Therefore, these assemblages of molecules can be characterized as heterogeneous. 

It would be useful if heterogeneous assemblages of available molecules could be searched 
for molecules which are likely to have a biological activity similar to a known compound before 
synthesis of new compounds is undertaken with the concomitant additional time and expense. 

BRIEF SUMMARY OF THE INVENTION 

Databases which contain the structures of a heterogenous assembly of available molecules 
can be searched for molecules having a biological activity similar to a known compound. Each 
molecule specified by the database is split into several fragments according to defined rules and 
the shape of those fragments is compared to the shape of the fragments generated from a query 
molecule using the topomeric field molecular structural descriptor. The molecules having the 
closest matching shapes to the query molecule are selected for further testing. 
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BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 shows a number of possible ways to fragment a molecule into two pieces in 
accordance with the fragmentation rule. 

Figure 2 shows a number of possible ways to fragment a molecule into three pieces in 
5 accordance with the fragmentation rule. 

DETAILED DESCRIPTION OF THE INVENTION 
Computational Environment: 

Generally, all calculations and analyses to perform the method of the disclosed invention 
are implemented in a modern computational chemistry environment using software designed to 
IQn handle molecular structures and associated properties and operations. For purposes of this 
ffi Application, such an environment is specifically referenced. In particular, the computational 
iy environment and capabilities of the SYBYL, UNITY, and CONCORD software programs 
J developed and/or marketed by Tripos, Inc. (St. Louis, Missouri) are specifically utilized. The 
j**g software code to implement the method of the disclosed invention is set out in the Appendices 
lib t0 this Application. Software with similar functionalities to SYBYL, UNITY, and CONCORD are 
O available from other sources, both commercial and non-commercial, well known to those in the 
art. A general purpose programmable digital computer with ample amounts of memory and hard 
disk storage is required for the implementation of this invention. In performing the methods of 
this invention, representations of thousands of molecules and molecular structures as well as 
20 other data may need to be stored simultaneously in the random access memory of the computer 
or in rapidly available permanent storage. The inventors use Silicon Graphics, Inc. (SGI) 
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"R12000" computers having 350 - 400 MHz processors and between 256 Mb and 512 Mb of 
memory with 8 - 10 Gb hard drive storage disks. In addition SGI "Origin" or "02" or "O2100" 
computers can be used. Access to several gigabytes of storage and faster Silicon Graphics, Inc. 
processors is useful. 
5 Incorporation of Patent Disclosures: 

The disclosures of U.S. Patent 6,185,506 and of U.S. Patent Application No. 08/903,217 
are expressly and completely incorporated into this application as if fully set forth herein. 
Topomeric Alignment: 

As taught in the incorporated U.S. Patent and patent application, molecular fragments 
lQn may be aligned following topologically-based rules to generate a single, consistent, 
pp unambiguous, aligned topomeric conformation. The procedure also takes full account of chiral 
111 atoms. All fragments which are to be compared in a search must be aligned with the same 
jr topomeric rules. In the present method such a topomeric alignment is used, the details of which 
^ are fully set out in the attached software code. 
ISp Calculation Of Fields: 
Q The basic CoMFA methodology provides for the calculation of both steric and 

electrostatic fields. It has been found up to the present point in time that using only the steric 
fields yields a better molecular structural descriptor than a combination of steric and electrostatic 
fields. There appear to be three factors responsible for this observation. First is the fact that 
20 steric interactions - classical bioisosterism - are certainly the best defined and probably the most 
important of the selective non-covalent interactions responsible for biological activity. Second, 
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adding the electrostatic interaction energies may not add much more information since the 
differences in electrostatic fields are not independent of the differences in steric fields. Third, 
the addition of the electrostatic fields will halve the contribution of the steric field to the 
differences between one shape and another. This will dilute out the steric contribution and also 
dilute the neighborhood property. Clearly, reducing the importance of a primary descriptor is 
not a way to increase accuracy. However, it is certainly possible that in a given special situation 
the electrostatic contribution might contribute significantly to the overall "shape". Under these 
unique circumstances, it would be appropriate to also use the electrostatic interaction energies 
or other molecular characterizes, and such are considered within the scope of this disclosure. 
In particular, as will be discussed below, it has been found that the additional information 
typically associated with pharmacophore mapping can be utilized to further characterize the 
similarity between topomerically aligned molecular fragments. 

The steric fields of the topomerically aligned molecular fragments are generated almost 
exactly as in a standard CoMFA analysis using an sp 3 carbon atom as the probe. In standard 
CoMFA, both the grid spacing and the size of the lattice space for which data points are 
calculated will depend on the size of the molecule and the resolution desired. Typically, a 2 A 
grid spacing in employed both in CoMFA and in the heterogenous database searching method 
of the present disclosure. However the grid dimensions are varied in the present invention. For 
query molecules, the size of the grid is adjusted to encompass the smallest region that all of the 
query fragments will fit into. This significantly reduces the number of calculations that are 
necessary without reducing the ability of the descriptor to fully characterize the structures. This 
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modification will be discussed in more detail below. The steric fields are set at a cutoff value 
(maximum value) as in standard CoMFA for lattice points whose total steric interaction with any 
side-chain atom(s) is greater than the cutoff value. 

One difference from the usual CoMFA procedure is that atoms which are separated by 
one or more rotatable bonds are set to make reduced contributions to the overall steric field. An 
attenuation factor, preferably about 0.85, is applied to the steric field contributions which result 
from these atoms. For atoms at the end of a long molecule, the attenuation factor produces very 
small field contributions (ie: [0.85] N ) where N is the number of rotatable bonds. This attenuation 
factor is applied in recognition of the fact that the rotation of the atoms provides for a flexibility 
of the molecule which permits the parts of the molecule furthest away from the point of 
attachment to assume whatever orientation may be imposed by the unknown receptor. If such 
atoms were weighted equally, the contributions to the fields of the significant steric differences 
due to the more anchored atoms (whose disposition in the volume defined by the receptor site 
is most critical) would be overshadowed by the effects of these flexible atoms. 
Topomer Similarity: 

The notion of topomer similarity between a pair of molecules is defined as the "distance" 
represented by the difference between the molecular fields which serve to characterize the 
molecules' shapes. As an example, assume two molecules A and B which have each been 
placed in their topomeric alignment and the steric field values calculated for each point in the 
surrounding three dimensional grids. Let each grid point be denoted by its corresponding 
cartesian X, Y, Z coordinate so that for each molecule the grid points are defined as X 0 , Y 0 , Z 0 
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X N , Y N , Z N . For each molecule A and B the field values, V A and V B , at each point in the 

grid are denoted as: 

V A V A V A V A V A V A znr\ V B V B V B V B V B V B 

V XO? V YO? V ZO V XN? V YN' V ZN dnu V XO? V Y0> V ZO V XN> V YN? V ZN* 

The root sum square of distances between the fields is then defined as: 
This distance is conveniently denoted as: 

For identical molecular structures, the distance equals 0. Therefore, the closer the value of the 
distance is to zero, the closer in shape two molecules will be. When searching among many 
possible structures, the minimum calculated value of the distance is sought. 
Fragmentation: 

The following critical question which frequently occurs in chemical research, and 
especially in biological research, can now be addressed. The problem, as it is usually presented, 
takes the form: given an arbitrary query molecule (generally one previously found to exhibit a 
desired activity), find biologically similar molecules, that is molecules of similar 3D shape and 
activity. Generally, such a query molecule will not have resulted from a combinatorial synthesis, 
and, in fact, no knowledge of a possible synthetic route to the molecule may be available. In 
searching the virtual library of Application No. 08/903,217, the topomeric 3D shape data within 
the virtual libraries actually describe fragments (structural variations) of molecules. To find 
similarly shaped molecules within the virtual library, the query molecule must be fragmented 
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and the shapes of its fragments compared with the shapes of corresponding fragments (structural 
variations) in the virtual library. The difficulty is that a query molecule can be fragmented in 
so very many ways. The solution adopted for virtual library searching was a way to emphasize 
those fragmentations that are most likely to conform to efficient synthetic routes from available 
starting materials, without requiring the searcher of the virtual library to have any knowledge 
of what synthetic routes it includes. 

The solution employed a "fragmentation table", where each row constitutes a rule of the 
following sort: "for each occurrence of this particular structural feature combination (structural 
variation) in the query molecule, decompose the query molecule in a particular way specified 
in terms of this structural feature, and search only those combinatorial libraries that utilize 
specified reactions (sequences) and/or building blocks, mapping specified query fragments onto 
specified classes of building blocks". Each such query decomposition found generates a search 
of the virtual library, returning all those products whose sum of squares of differences in shape 
between corresponding product and query fragments is less than a user specified neighborhood 
distance threshold. Passing the query molecule (by means of a suitable computer program) 
against all the rows of this table generates all searches. 

The situation is much more complicated when a search of a database of heterogeneous 
compounds is desired. Not only is it necessary to fragment the query molecule, but each 
molecule in the database has to be likewise fragmented and comparisons made between the query 
fragments and the fragments arising from each molecule. Typically, anywhere from 2 to 50 
different fragments might be generated by fragmenting each molecule in the database. To 
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compare 6 fragments from a query molecule to an average of 20 fragments from each of 50,000 
molecules in a heterogeneous database would require 6 X 20 X 50,000 = 6,000,000 field 
comparisons. [Actually, as will be described below, because fragment pairs or triplets are 
involved, cross comparisons increase this number.] This is at least an order of magnitude greater 
than the typical 6 fragment query comparison to even 50,000 structural variations in the virtual 
library. In principal, a virtual library of every fragment occurring in all of the molecules in all 
examined heterogenous databases could be assembled, but the size of such a virtual library and 
the complexities of searching are not trivial. 

The method adopted for the present invention does not precalculate and store the metric 
characteristics of each fragment of each heterogenous database molecule. Rather, as each 
molecule is fragmented, the topomeric alignment and associated field is generated on-the-fly for 
each fragment and compared to the topomerically aligned field of a query molecule fragment. 
While the full fragmentation table scheme employed with the virtual library of Application No. 
08/903,217 may be employed, experience with fragmentations has shown that for medicinal type 
molecules the following fragmentation rule (which is a subset of the more general fragmentation 
method) produces meaningful fragments: 

"Break the molecule at acyclic bonds either singly or in pairs to generate sets of either 

2 or 3 fragments respectively where each fragment must contain greater than a user 

specified number of heavy atoms. " 

Assuming a setting that every fragment must contain at least three heavy atoms, Figure 
1 shows an example of how the rule is applied in a typical molecule (either a query molecule 
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or a database molecule) to generate fragments. To generate the fragments, the whole structure 
is evaluated for each new fragmentation position. The two-piece fragmentations which will be 
performed are indicted by the thick lines. The two-piece fragmentations that will not be 
performed (because one of the resulting fragments contains less than three heavy atoms) are 
indicated by the thin lines. In this example, if, instead of requiring three heavy atoms, the user 
required five heavy atoms, then only the fragmentation between the two rings would be 
performed. 

An example of a three piece fragmentation is shown in Figure 2. Assuming again a 
setting that every fragment must contain at least three heavy atoms, the heavy lines indicate by 
arrows the two position in which the molecule would be fragmented into 3 fragments. The light 
lines indicate by arrows some of the three piece fragmentations that will not be performed 
because at least one of the fragments has fewer than three heavy atoms. If, instead of requiring 
three heavy atoms, the user required five heavy atoms, then no three-piece fragmentations would 
be performed. 

At the present time, it has been found that generating three fragments is necessary when 
a two fragment scheme does not yield significant results. The three fragment scheme seems to 
find similar shapes that are sometimes missed in two fragment analysis. However, due to the 
higher computational overhead of three fragment searching, searches are first performed at the 
two fragment level. Four fragment searches may be necessary for some types of molecules, but 
at the time of filing the present disclosure, such situations have not been identified. Clearly the 
searching method of the present invention is not limited to the number of fragments which are 
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generated but is generally applicable to as many fragments as the user wishes to consider. 
Topomeric 3D Searching: 

When analyzing molecules for shape similarity, it should be recognized that not all the 
elements of a molecule's shape may be required for proper interaction with a larger biomolecule. 
Perhaps in some instances, the entire shape is critical to the match. In other instances, only part 
of the molecule's shape may be critical to the match and other parts relatively unimportant. 
When comparing shapes of query molecules to those found in a heterogenous database, it is 
important to be able to compare not only the overall shape of the molecules, but also subparts. 
The method and software of the present invention permit many types of shape comparisons as 
will be discussed below. 

Different heterogenous databases of compounds store compound structures in different 
formats such as SMILES, SLN, or an MDL format. Many software programs are available for 
interconverting the structures from one format to another. For the present application, the 
inventors use UNITY to convert compound information to SLN (Sybyl Line Notation) format. 
Compound information is then transferred to the CONCORD software program. CONCORD 
generates the three dimensional structure of the molecule. The starting point for topomeric 
searching of compounds listed in a heterogenous database are the CONCORD generated three 
dimensional structures of the database molecules and the query molecule. These structures are 
provided as input to the software programs set forth in the Appendices to the present disclosure. 

The user specified fragmentation pattern (2 or 3 fragments and the number of included 
heavy atoms) is applied to the query molecule and the first database specified molecule. After 
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each set of shape comparisons, the next database specified molecule is taken up in order. After 
the fragmentation patterns have been identified for each molecule (query or database), each 
fragment is aligned according to the topomeric rules. 

In the preferred embodiment, the fragment is translated and placed into the grid so that 
the atom from which the "broken" acyclic bond extends into the fragment of interest is placed 
at the 0,0,0 coordinate. The "broken" bond (the attachment bond) is then directed along the X 
axis (standard topomer alignment) and the part of the molecule which is considered the fragment 
is aligned topomer ically in the grided space. Alternatively, the atom in the fragment of interest 
which is connected to the acyclic bond which is "broken" is placed at the 0,0,0, position. This 
results in virtually insignificant differences in the topomer distances which are calculated. 

Another feature of the present method is that a variable size grid region is used. Since 
some fragments are small and others large, the same volume of three dimensional grid space is 
not required to contain each fragment. Nothing is gained by placing a small fragment in a large 
grid space and only results in calculating an unnecessary number of extra grid location 
interactions. For the query molecule, the grid is adjusted to encompass the smallest region in 
which all the query fragments will fit. For database molecule fragments, the initial database 
molecule grid is one unit larger in all dimensions that the grid determined for the query 
fragments. The grid size is expanded by one unit in each dimension until the accumulated sum 
of the grid intersection points (starting with the query grid size and adding all the intersection 
points contained in each expanded grid) is greater than 10,000 or the grid has been expanded 
from its initial size by 11 units in each dimension. This procedure is followed since most 
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computers, even those configured for molecular modeling, have a memory capacity which can 
be exceeded by allowing for unlimited grid size and number of intersection points. The grid size 
limitations are not required by the inherent method of the invention. Compression of the data 
from the thousands of data points in a large grid also aids in reducing the memory requirement 
for large grids. When a situation is encountered where the database molecular fragment extends 
outside of the maximum grid size, an "outside of the grid" factor is applied my multiplying the 
number of atoms outside the grid by the maximum interaction energy possible (typically 900) 
and adding that value as additional term in the root sum of squares similarity calculation. The 
use of dynamic grid sizing increases the throughput performance of the method considerably. 
Whole Molecule Two Piece Comparisons: 

As noted, for a two piece comparison both the query molecule and the database molecule 
are always split into just two pieces at each acyclic bond starting with the whole molecule each 
time. If there are 4 acyclic bonds and the heavy atom count matches the user selected value 
(default is typically = 4), four two fragment pairs will be generated. As an example of the shape 
comparison, consider a query molecule which can only be broken at one acyclic bond to form 
fragments A and B. Consider also that a database molecule can only be broken at one acyclic 
bond into fragments C and D. Among the four fragments, there are two sets of comparisons 
possible: A:C & B:D, and A:D & B:C. A first comparison is made between: A:C and B:D. [In 
the actual calculation the squared differences in the field values between each grid location in 
each fragment are kept and the square root is only taken at the end of the comparison process.] 
Thus for the A:C & B:D comparison, a distance is determined as: 
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This value is retained for comparison. For the A:D & B:C comparison, a distance is determined 
as: 


This value is compared to the value determined for the first A:C & B:D set and the lower value 
(greater similarity) retained. Thus, there are two comparison for each pair of molecules. It has 
been found that generally one will be significantly more similar than the other. The lower (more 
similar) value is retained and compared to the values obtained for the query against every other 
molecule in the database. Ultimately, the molecules in the database which are most similarly 
shaped to the query molecule will be determined by those with the smallest field difference. 

As a further example consider a query molecule which can be broken at four acyclic 
bonds to form four two fragment pairs and a database molecule which can be broken at five 
acyclic bonds to form five two fragment pairs, this may be represented as: 


J(A:Df+(B:Q 2 
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Q 
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The first comparison will be A:I & B: J and A:J & B:I. A second comparison will be A:K 
& B:L and A:L & B:K. Similar comparisons will be obtained between each query fragment pair 
and each database molecule fragment pair. Of all the comparisons, the one having the smallest 
difference in field value will be kept for further comparison to the values obtained for all the 
molecules in the database. These comparison are whole molecule comparison because each 
fragment of the query molecule is compared to each fragment of every database molecule in sets 
of two (representing a complete molecule). 

Whole Molecule Three Piece Comparisons: 

If a three piece fragmentation scheme is employed the same shape comparison principles 
apply but are further complicated by the presence of the central fragment. In two piece 
fragmentation, each fragment has only one attachment bond which may be placed at the 0,0,0, 
grid coordinate. There is, therefore, only one topomeric alignment for the fragment. However, 
the central fragment in a three piece fragmentation will have two attachment bonds one each at 
the points were the two side fragments have been severed. There will, therefore, be two starting 
points for the topomeric alignment which will result in a different topomer shape of the aligned 
fragment. Each of these shapes must be included in the comparison. 

As an example consider a query and a database molecule each which may be broken into 

three three piece fragmentations: 

Query Database 
A J 
B K 
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B' K' 
C L 

D M 
5 EN 
E' N' 
F 0 

G P 
10 HQ 

H' Q' 
I R 


The primed fragments represent the second orientation of the central fragment of the 
15 three. Fields are calculated for all fragments as before. Considering just the first fragment set 
from both the query and database molecules the first set of distance comparisons are: A: J & B:K 
J| & B':K' & C:L and the distances is: 

m J(A'J) 2 +(C.Lf+[(B:K) 2 +(B ':Kf]f2 

j The last term takes the average contribution of the center piece. Similarly, the other possible 
h comparisons are calculated as: 

2(H From the two sets of comparisons, the one with the lower field difference (more similar) is 
retained for comparison. All the other comparisons between each three fragment set of the query 
and each three fragment set of the database molecule are calculated and the one with the lowest 
field difference is retained for comparison with those generated for all the other database 
molecules. 

25 One further complication which arises with three piece fragmentation is that it is 
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sometimes necessary to apply an attachment bond penalty to the calculated distance to reflect 
differences in the structure. Since there are two attachment bond points, the spatial relationship 
between those points will influence the shape of the whole molecule. However, considering just 
the fragments will not totally reflect the shape characteristics specified by the spatial relationship 
of the attachment points. This is an attempt to preserve the three dimensional structure of the 
whole molecule. A penalty value is thus added to the shape differences (increasing the apparent 
difference or similarity) to compensate. The penalty value is calculated as: 

J[(B:K)2 + (B':K) 2 y2 

This penalty value is multiplied by an arbitrary factor depending on the user's belief in the 
significance of the structural difference. The penalty is initially set at 10 in the code but might 
be set as high as 100. For instance, as an example consider the ortho, meta, and para positional 
attachment bonds on a ring. The overall molecular shape will vary significantly if two side 
chains are in the ortho versus the para position with respect to each other. Accordingly, for the 
1 atom difference of an ortho relationship, a penalty of 10 would be applied; for the 2 atom 
difference of a meta relationship, a 20 unit penalty would be applied; and for the 3 atom 
difference of a para relationship, a penalty of 30 would be applied. The point is that in 
determining the shape comparisons, a substituent can not just be moved around the ring and have 
it match without some penalty to reflect the difference in position. 

For large molecules small changes in the number of atoms in the molecule is less likely 
to effect the overall shape than for small molecules. For effective shape comparisons, large 
structures need to be less sensitive to steric difference while small structures need to be more 
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sensitive to steric differences. Experience has shown that there is a pivot point around 25 heavy 
atoms with structures considered large with more than 25 heavy atoms. Increasing the weighting 
of the steric contributions for small structures and decreasing it for larger structures has been 
found with experimental data sets to cut the number of false positives in half for small structures 
and allow more hits for large structures without eliminating many small structure hits. 
Accordingly, for structures having more than 25 heavy atoms the steric field values calculated 
for each point in the grid may be decreased by as much as 33% (field values multiplied by 
0.67). For structures having fewer than 25 heavy atoms the steric field values calculated for each 
point in the grid may be increased by as much as 100% (field values multiplied by 2.0). A non- 
linear multiple seems to work best. 

In addition to using a variable grid size, another observation leads to a method of 
increasing the effectiveness and throughput of the searching methodology. It has been observed 
that for molecules which have a size difference of over +/- 12 heavy atoms, there is little 
likelihood of finding molecules which match in shape. Consider a query with 20 heavy atoms 
and a database molecule with 33 heavy atoms. Since to start with there will be 13 atoms in the 
database molecule which will not be matched in the query, a large distance (dissimilarity) will 
already be found due to the missing atoms. The likelihood that all of the remaining atoms will 
lie in equivalent positions so that only the missing atoms will contribute to the difference in field 
values (and hence in similarity) is vanishingly small. Experimental runs on known data sets bears 
out this observation. Before any fragmentation is done, the difference in heavy atom size of the 
query and database compound is determined, and, if the difference is greater than 12 heavy 
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atoms, the comparison is skipped. 
Subset Searching: 

As noted above, only part of the shape characteristic of many molecules may be 

responsible for the binding of those molecules to larger biomolecules. Accordingly, a search is 

desired which would find whether any part of the query molecule has the same shape as any part 

of the database molecule. This can be thought of as a partial fragment match. The method of this 

invention directly permits this type of search to be conducted. The query molecule is fragmented 

into two parts and the database molecule is fragmented into three parts in as many different ways 

as possible. For each possible three piece fragmentation you get: 

Query Database 

E A 
F B 
C 

In order to determine whether any part of the database molecule matches any part of the query 
the following comparisons are done: 

E : A E : B 

F : B F : C 

F : A F : B 

E : B E : C 

Since you are interested in locating any part of the database molecule which is closely similar 

in shape to all parts of the query molecule, the difference in heavy atom count exclusion which 

is applied to whole molecule searching is modified for subset matching. Instead of excluding the 

search if there is a +/- 12 heavy atom difference, for subset searching the exclusion is not 
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applied unless there is a +/- 30 heavy atom difference. 


Core Searching: 


In some instances it is desirable to find another core of similar shape to a known core 
upon which a series of molecules may be built. For instance, suppose a patented series of 
compounds can be recognized as built upon a particular core. If that core can be replaced with 
a similarly shaped but chemically different core, it may be possible to construct an entirely new 
series of compounds active at the same site without infringing the patented series. To conduct 
this type of search the core and its two attachment bonds needs to be specified. How the 
searcher decides on the core structure is up to the searcher. The core is aligned in its two 
possible topomeric orientations and the fields calculated. The topomerically aligned field of only 
the central fragment of all possible three piece fragmentations of the database molecules are 
compared to the core fields as A:C & A':C: 


Again, as before in the case of three fragment searching which involves a central 
fragment with two attachment positions, attachment penalties can be assigned to better 
characterize/distinguish the overall molecular shape based on where the attachment bonds are 
placed with respect to each other on the query core structure. For core searching, the penalty 
multiplier is typically set at 50. The molecules identified in the database which have central 
fragments generating the smallest values (greatest similarity) in the comparison to the specified 
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core would be examined for possible use as cores. 
Features: 

As noted earlier, there may be some circumstances where the electrostatic field may be 
used in addition to the steric field to characterize the shape of a topomerically aligned fragment. 
5 A much more useful characterization has been implemented which extends ideas from 
pharmacophore modeling for use in searching heterogenous databases of compounds. It is well 
recognized that certain characteristic interactions of molecules in addition to shape play an 
important role in determining whether that molecule will bind to a larger biomolecule. 
Complimentarity of shape permits the molecules to approach each other closely enough for these 
10Q interactions to take place. In pharmacophore modeling the presence and location of feature 
23 classes containing molecular characteristics thought important to the binding of the molecule is 
tracked as well as the distances and directions between the features. An absence of any given 
m feature in a molecule or a different location is considered to significantly reduce the likelihood 
p of that molecule's binding and, thus, typical pharmacophore modeling is an all or nothing 
15D proposition. Clearly, in the present methodology due to the topomeric alignment of fragments 
p all distance and direction attributes of features present in the fragments are lost. 

However, an alternative approach to incorporating the characteristic interactions in 
conjunction with the shape similarity matching described above has proven to generate an 
exceedingly powerful and accurate discovery methodology. The classic five feature classes are 
20 employed : positive charge , negative charge , hydrogen-bond-donating , hydrogen-bond-accepting , 
and aromatic. When present in either the query molecule or the database molecule, the features 
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are assigned X,Y,Z point locations in the topomer alignment either centered on the relevant 
atom, or, in the case of aromatic rings, the centroid of the ring is specified. Generating the 
topomer conformation of a molecular fragment not only fixes the steric shape of that fragment, 
but is also fixes the Cartesian coordinates of each pharmacophore feature contained within the 
fragment. The search strategy can be summarized as finding all the database molecule fragments 
which have features, similarly located in topomer space and similar in any other detailed feature 
property, that match each of the features in the topomerized fragments of the query structure. 

In keeping with the distance definitions used for steric shape similarity, differences in 
features are defined with the same dimensionality as shape so that both shape and features can 
be used to characterize a fragment for searching. Feature by feature differences are also 
combined in a root sum square rather than a straight sum fashion. Thus, a second feature 
mismatch would not be as costly as the first one. To determine the feature "distance", each of 
the pharmacophoric features in the query structure is considered in turn, by identifying the 
closest feature of the same pharmacophoric class in the database molecule fragment. If there is 
no such feature or if the nearest such feature is more than 1.5 A distant, the dissimilarity sum 
of squares is increased by a maximum of 100X100 units. (Units are chosen to be commensurate 
with the steric shape units of kcal/mole- Angstrom 3 .) If there is a matching feature within 0.5 A, 
the dissimilarity is set to zero. For a feature separation between 0.5 A and 1.5 A the 
dissimilarity penalty increment is obtained by linear interpolation between 0 and 100X100 unit 
values. Further, it is possible to scale/ weight the feature contribution to increase or decrease its 
relative contribution with respect to the steric contribution to the observed similarity (distance). 
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Note that the use of the term "distance" with the feature searching methodology of the present 
invention is not meant to refer to an actual physical "distance" as considered in traditional 
pharmacophore techniques. For a two piece fragmentation the distance (similarity) between 
fragments is calculated as: 


STERIC 

The cross terms for the A:D and B:C comparisons follow a similar definition as earlier. It has 
been observed that if the value of: 


is too high, the distance will be large (little similarity) and the full calculation including the time 
consuming calculation of steric field can be skipped. This also increases the effectiveness and 

throughput of the method. 

While the relative weight of each feature's contribution to the field can be varied, in the 
basic method, an attempt is made to match all features in a query with the nearest feature of the 
same class in the database molecule. This is similar to a pharmacophore type match, but there 
is no concern with matching interfeature distances in the topomeric conformation. Further, 
unlike standard pharmacophore searching, the user is able to assign adjustable penalties in the 
event that an exact match is not possible. For instance, a nearby spatial match of one type of 
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feature might be more acceptable to the user than a nearby spatial match of another feature. The 
distance penalty for the spatially mismatched first feature could be set much lower than for a 
spatially mismatched of the second feature. The features method also permits handling of 
situations where a feature is present in a database molecule but not in the query molecule. In 
standard pharmacophore technique, this situation would lead to a total mismatch. However, in 
the present method the user can assign a distance (similarity) penalty for the absence of the 
match to the query, but need not totally ignore either the overall shape of the query or the 
contribution of the other features in judging the similarity of the structures. 
Partial Feature Matching: 

It is recognized that very frequently the binding of small molecules to receptors is highly 
dependant on the interaction between hydrogen-bond-donating and hydrogen-bond-accepting 
atoms. For partial feature matching, the search for charged groups and aromatic rings may be 
turned off. A large penalty (10,000 units) is applied for donors and acceptors which do not 
align. In addition, the number of donor or acceptor matches required can be varied. This 
capability is included since it is recognized that frequently only 2 or 3 groups are required to 
make a small molecule active. For partial feature matching, all the hydrogen-bond-donating and 
hydrogen-bond-accepting features are examined but only those generating the lowest 2 or 3 
distances (including applicable penalties) across all (A:C, A:D, B:C, & B:D) the fragment 
comparisons for the compounds are used. 

A further variation of the partial feature matching method considers the situation where 
the user determines that there is only one feature which is most important to match. If that 
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feature is present and properly located, there is no penalty, the field differences are zero and the 
similarity is great. The flip side of single feature matching is that if the feature doesn't match 
a very large penalty is imposed to clearly yield a large difference (greater distance and low 
similarity). 

5 Feature matching has been found to greatly increase the effectiveness of the heterogenous 

database searching since it compliments the shape specific searching. Use of both steric shape 
searching and feature searching of a topomerically aligned fragments has been found to be as 
good as or better than any equivalent 2D searching with fingerprints which has been, until now, 
the gold standard of searching technologies. In addition, the results of shape and feature 
lOQ similarity searching yields actual molecular structures which chemists recognize as being 
^ members of the same class of compounds. Also, unlike fragment searching, molecular structures 
"i« are clearly identified which can serve as bases for continued development, 
go The method of the present invention for the first time permits the three dimensional 

□ searching of a heterogenous compound database for compounds that are likely to have the same 
l^P biological activity as a query molecule. The results identify molecular structures having similar 
shape properties, and, when used with features, similar pharmacophoric properties. The 
identification of the structural fragments which contribute to the identified similarity provide an 
insight into the shape requirements of the receptor, and just as importantly, into likely additional 
molecular structures and corresponding shapes which will likely share the same activity. Thus, 
20 lead development is more straight forward from a knowledge of the relevant shape characteristics 
of the fragments provided by the method of this patent disclosure than from any two dimensional 
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searching technique. 
Output: 

The most commonly used output reports the single best match between the query 
molecule and all molecules in the heterogenous database. The two or three piece fragment which 
5 was responsible for the match is also reported. A variation of the output, displays the fragment 
of the best hits and the query fragment that it matches. Once can also ask the system to list all 
hits with field differences less than some value; in other words a list of the most similar 
molecules. 

The software code written in the C language contained in the Appendices implements all 
103 the capacities of the present invention. The CT_TOP.C code provides all the calculation 
W functionalities. DBTOP.C contains the command line interface, the user inputs, code to read the 
1 input structures, calls to the CT TOP.C routines, and output interface. CT TOP.H lists all the 
5 required data structures used. The code needs to be compiled by a standard C compiler before 
□ being run as is well understood in the art. All together, all code necessary to fully disclose an 
l|3 enabling embodiment of the invention in the computational chemistry environment specified 
H earlier is set forth in the Appendices. 

From the proceeding description of the construction, generation, and searching of a 
heterogeneous database of molecules, it should be clear that there are many variations which 
may be employed and, having taught how to generate and search one specific embodiment, all 
20 equivalent embodiments are considered within the scope of this disclosure. 

While the preceding written description is provided as an aid in understanding, it should 
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be understood that the source code listings appended to this application constitute a complete 
disclosure of the best mode currently known to the inventors of the methods of heterogeneous 
database searching. 

Thus, while this invention has been particularly described with reference to the drug lead 
identification art, it is clear that the validation of molecular structural descriptors and their use 
in selecting structurally diverse sets of chemical compounds can be applied anywhere a large 
number of compounds is encountered from which a representative subset is desired. Since the 
implications and advances in the art provided by the methods of this invention are still so new, 
the entire range of possible uses for the methods of this invention can not be fully described at 
the present time. However, such as yet identified uses are considered to fall under the teachings 
and claims of this invention if validated molecular structural descriptors are employed to 
characterize the diversity of molecules. 
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APPENDIX "A" - DBTQP.C 


^include <stdio.h> 
include <stdlib.h> 

5 ^include <malloc.h> 
^include <ctype.h> 
#include <time.h> 
^include < memory. h> 
include "ct.h" 

10 #include "ct_proto.h" 

#include " import jDroto.h" 
#include "utl_mem.h" 
#include "utl_scan.h" 
include "utl_set.h" 

15 include "comfa.h" 
#include "parseopt.h" 
include "ctjop.h" 


/* Option variables */ 
20_ static char *hitlist; 
y static char *UnityDatabase; 
static char *UnitySetName; 
static char *QueryFileName; 
!!f static char *queryDetailFileName; 
25^; j static double radius = 120.0; 
% static int min_atoms = -1; 
2 static int AllowTerminal Atoms — -1; 
! ^ static double reductionFactor = 0.85; 
static double attachmentFactor = -1.0; 
3(tS static double max_attachpen = 100.0; /* 2x attachmentFactor - about 2 angstroms */ 
S static double featureFactor = 1.0; 
Si static double extraFeaturePenalty = 0.1; 
j*^ static int stericPivot = 30; 
£^ static int partialMatch = 0; 
35 static int useFallback = 1; 
static int do2piece = 1; 
static int do3piece = 1 ; 

static int doSubset = 0; /* query 2 piece, with structure 3 piece */ 

static int minHevSubset = -1; /* -1 means to auto adjust, 4 hev atoms less than query */ 
40 static int minHev = -1; 

static int maxHev = -1; 

static int hevDiff = -2; 

static int normalize = -1; 

static int maxjiits = 0; 
45 static int useFeatureCharges = 1 ; 

static char *str_featureWeights; 

static char *OutputFileName; 

static char *reportjnodes[] = { "tsv\ "tsvd", "regid", "sin", "detail", "core", "matrix", (char *) 0 }; 
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static char *region_modesQ = { "normal", "big", "huge", (char *) 0 }; 
static char *feature_modesn = { "unitypref", "unity", "topomer", (char *) 0 }; 
static FeatureSetName featureSet = UseTopomerFeatures; 
static int reportWarnings = 1; 
5 static int regionMode; 

static double stepSize = 2.0; 
static int debugLevel = 2; 
static char *debugFileName; 
static int res_alloc; 
10 static double *parseFeatureWeights(char *sptr ); 

int token_string(char *str, char token, int maxtoks, int skipMult, char **tokens ); 
static int DoCoreSearching( struct CtConnectionTable *qct s FILE *infp, FILE *outfjp ); 
static int TriposSponge(int cnt); 
static double getLoad(char *line); 

15 

typedef enum 
{ 

ReportTSV, 

ReportTSVD, 
20 ReportRegid, 
£3 ReportSln, 

ReportDetail, 
EH ReportCore, 

ReportMatrix, 
25^2 ReportBrief, 
% ReportStats, 
J } ReportMode; 

30^f static ReportMode rmode; 

s /+ 

12 WARNING: If you add or subtract options before -report adjust REPORT_OFFSET accordingly. 

35^ */ 

#defme FEATURE__SET_OFFSET 15 
#define REPORT_OFFSET 28 

static struct ParseOptions Options [] = { 
40 { "hitlist", ParseOptString, &hitlist, 

"Name of a sin hitlist containing structures to search with 3D coordinates." }, 
{ "database", ParseOptString, &Unity Database, 

"Name of a Sybyl/3DB database\n\t Without -database or -hitlist stdin is used." }, 
{ "use_subset'\ ParseOptString, &UnitySetName, 
45 "Name of selection set to use vs entire database." }, 

{ "query", ParseOptString, &QueryFileName, 

"Name of a file containing the query 
structure. \n \nFieId OptionsW}, 
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{ "distance", ParseOptDouble, &radius, 

"maximum shape units distance to report as a hit, default is 120." }, 
{ "stericpivot", ParseOptlnt, &stericPivot, 

"autoscale steric pivot point. Queries having fewer than N heavy atoms are more 
5 sensative to steric differences, \n\t\t0 is disabled. Default 30." }, 
{ "partialmatch", ParseOptlnt, &partialMatch, 

"donor and acceptor partial match. The lowest N HBD/HBA feature penalties contribute 
to the distance, \n\t\t0 is disabled. Default is 0" }, 
{ "minatoms", ParseOptlnt, &min_atoms, 
10 "minimum number of HEV atoms per fragment, default is 4. (a negative value sets the 

minimum number of 2piece splits" }, 

{ "terminal", ParseOptBoolean, &AllowTerminal Atoms, 

"Use -{-terminal to enable the counting of terminal atoms, default -terminal." }, 
{ "hevdiff, ParseOptlnt, &hevDiff, 
15 "Maximum allowed heavy atom count difference to compare compounds, \n\t\tdefault 12 

inclusive, 30 with +subset, -1 means disabled." }, 
{ "hevjxun", ParseOptlnt, &minHev, 

"Minimum number of heavy atoms required in structure to search. Default 10\n" }, 
{ "hev_max", ParseOptlnt, &maxHev, 
20 "Maximum number of heavy atoms allowed in structure to search. Default 80\n" }, 

y f { "attach", ParseOptDouble, &attachmentFactor, 

"attachment penalty factor for 3 piece comparisons, default 10.0, 50 for core mode" }, 
J;; { "max_attach", ParseOptDouble, &max_attachpen, 

:!f "maximum attachment penalty for core searching -report core, default 100.0 " }, 

25^ { "feature", ParseOptDouble, &featureFactor, 

"Feature scaling factor, default 1.0" }, 

:E { "usefeatureset", ParseOptEnum, feature_modes, 

% ~ "Default is topomer" }, 

J*** { "charge", ParseOptBoolean, &useFeatureCharges, 

30J "use -charge to disable charge group features, they have a high default penalty " }, 

5 { "weight", ParseOpt String, &str_featureWeights, 

m "Comma seperated list of 5 feature weights, aromatic, pos charge groups, neg, HBA, 

g HBD, \n\t\tdefault 20,200,200,100,100 " }, 
Q { "extra", ParseOptDouble, &extraFeaturePenalty, 

35 "Extra feature penalty factor applied to feature weight, default 0.1 " }, 

{ "arom", ParseOptBoolean, &normalize, 

"Default is false for database, true otherwise -arom disables +arom enables " }, 
{ "agscale", ParseOptDouble, &reductionFactor, 

"Aggregate scaling factor for rotatable bonds, default 0.85." }, 
40 { "2piece", ParseOptBoolean, &do2piece, 

"Use -2piece to disable 2 piece comparisons." }, 
{ "3piece", ParseOptBoolean, &do3piece, 

"use -3piece to disable 3 piece comparisons." }, 
{ "subset", ParseOptBoolean, &doSubset, 
45 "use + subset to enable subset searching. \n\t\tQuery is allowed to hit larger structure 

containing a portion of the 2 piece fragmentation." }, 
{ "stepsize", ParseOptDouble, &stepSize, 

"Step size of the grid points, default 2.0, lower values take longer" }, 
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{ "fallback", ParseOptBoolean, &useFallback, 

"Use -fallback to disable using smaller minimum atoms when no splitting 

occurs An \nOutput Options\n" }, 

{ "besthits", ParseOptlnt, &max_hits, 
5 "Will report the compounds with the N lowest shapeunit scores less than or equal to the 

-shapeunits value." }, 

{ "output", ParseOptString, &OutputFileName, 

"Will report results to this filename, default is stdout." }, 
{ "report", ParseOptEnum, reportjnodes, 
10 "Reporting mode, default is TSV " }, 

{ "qdetaiF, ParseOptString, &queryDetailFileName, 
"write query fragments to this filename." }, 
{ "debugFile", ParseOptString, &debugFileName, 

"write debugging information to this file, CAUTION: creates extension amount of 
15 information per compound" }, 

}; 

/* static variables */ 

static top_result **result_root; 
20 static int result_idx; 
Q static int cnt = 0; 
*D static int nhit = 0; 
03 static time_t tnow; 

4* /* local functions */ 

j; static FILE *open_input_source(char *unitydb, char *setname, char *hitlist, int *r_ispipe ); 
m static void saveResult(top_result *res, int maxjiits, double *r_radius ); 

static int top_result_compare(const void *vnrec, const void *vtrec ); 
30^f static void formatTSV(FILE *fp, struct CtConnectionTable *ct, double comfa_diff, int idx); 
% static int formatDetail(FILE *f]p, topjresult *res, int reportHitFrags ); 
Jr: static void fcrmatTSVD(FILE *fp, top_result *res ); 

static void formatRegid(FILE *fp, struct CtConnectionTable *ct, int idx); 
rf static void writeDetailHeader(FILE *ip, ReportMode rmode); 
35"" static void writeTSVDHeader(FILE *fp); 

static int echo_hitlistLine(char *line); 

static void setAttr(struct CtConnectionTable *ct, char *name, char *value ); 
static void writeQueryDetails(char *fname ); 

40 #if0 

#define CACHE_COUNTERS 1 
#endif 

int main(int argc, char *argv[] ) 
45 { 

FILE *outfp; 
FILE *in_fp; 
FILE *qfp; 
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FILE *dfp = (FILE *) 0; 
int isPipe; 
int i; 

struct CtConnectionTable *ct; 
5 struct CtConnectionTable *qct; 

struct CtConnectionTable *core_qct; 
char *tptr; 
char *sln; 
char *regid; 

10 int tjxags, tJJcompare, t_3compare, t_fcompare, t_filtered, t_feat; 

int nargs; 

double comfa_diff; 
int filtered; 
top_result *res; 
15 double *cord; 

int natoms; 
int noCordCnt = 0; 
int mixtures = 0; 
int nParts; 
20 int keepCts; 

□ topjresult *rptr; 

C double outsidePerc; 

W int queryHevCount; 

int strHevCount; 
2fP] int realHevCount; 

+: int hevFiltered = O; 

J int strHevDiff; 

yy double *my Feature Weights; 

L #ifdef CACHE_COUNTERS 
30M inteO, el; 

1? long long cO, cl; 

#endif 

r ;:J #ifdef M_MXFAST 
3f ~ mallopt(M_MXFAST, 128); 

#endif 

#ifdef MBLKSZ 

mallopt(M_BLKSZ, 16* 1024); 

#endif 

40 #ifdef MFREEHD 

mallopt(M_FREEHD, 1); 

#endif 

#ifdef MMXCHK 

mallopt(M_MXCHK, 100000); 

45 #endif 
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); 


#if 0 


nargs = UTL PARSE_OPT( argc, argv, sizeof(Options) / sizeof(struct ParseOptions), Options 

if ( ! nargs ) 

return -1; 


if ( ! LM ST AND ALONE_INIT() ) 
{ 

fprintf(stderr, "License intialization failed. \n H ); 
10 return -1; 

} 

if ( ! LMSTAND ALONE_VALID_LICENSE( "QS AR") ) 
{ 

iprintf(stderr,"A valid QSAR license is requiredAn"); 
15 return -1; 

} 

#endif 

rmode = ReportTSV; 
20 if ( Options[REPORT_OFFSET] . explicit ) 

5 tptr = *((char **) Options [REPORTOFFSET] .value); 

J if ( !strcmp(tptr,"tsv") ) 

Hf rmode = ReportTSV; 

25tJ else if ( !strcmp(tptr,"tsvd") ) 

% rmode = ReportTSVD; 

£ else if ( !strcmp(tptr,"regid" ) ) 

^ rmode = ReportRegid; 

L else if ( !strcmp(tptr, "detail" ) ) 
30% rmode = ReportDetail; 

E else if ( !strcmp(tptr,"sln" ) ) 

SI rmode = ReportSln; 

?5 else if ( !strcmp(tptr, M core" ) ) 

i[7 rmode = ReportCore; 

35 else if ( !strcmp(tptr, "matrix" ) ) 

rmode = ReportMatrix; 

else 

{ 

fprintf(stderr/Not a valid reporting option: %s\n", tptr ); 
40 return -1; 

} 

} 

if ( Options [FE ATURE_SET_OFFSET] . explicit ) 
{ 

45 tptr = *((char **) Options [FEATURESETOFFSET] . value); 

if ( !strcmp(tptr,"topomer" ) ) 

featureSet = UseTopomerFeatures; 
else if ( !strcmp(tptr, "unity" ) ) 
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featureSet = UseUnity Features; 

else 

featureSet = UsePreferredUnity Features; 
fprintf(stderr, "Using %s feature set %d\n", tptr, featureSet ); 

} 

if ( hevDiff = = -2 ) 
{ 

if ( rmode = — ReportCore ) 

hevDiff = -1; 
else if ( doSubset ) 

hevDiff = 30; 

else 

hevDiff = 12; 

} 

if ( minHev = = -1 ) 

{ 

if ( rmode = = ReportCore ) 

minHev = 1; 
else if ( doSubset ) 

minHev = 10; 

else 

minHev = 10; 

} 

if ( maxHev = = -1 ) 

{ 

if ( rmode = = ReportCore ) 

maxHev = 1000; 
else if ( doSubset ) 

maxHev = 80; 

else 

maxHev = 80; 

} 

if ( attachmentFactor = = -1 ) 
{ 

if ( rmode = = ReportCore ) 

attachmentFactor = 50.0; 

else 

attachmentFactor = 10.0; 

} 

if ( min_atoms = = -1 ) 
{ 

if ( rmode = = ReportCore ) 
min_atoms = 1; 

else 

min_atoms = 4; 

} 

if ( AllowTerminal Atoms == -1 ) 

{ 
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if ( rmode = = ReportCore ) 

AllowTerminalAtoms = 1; 

else 

AllowTerminalAtoms = 0; 

5 } 

if ( normalize = = -1 ) /* User didn't specify, so auto select based upon input 

type */ 

{ 

if ( UnityDatabase ) 
10 normalize = 0; 

else 

normalize = 1; 

} 

if ( ! UnityDatabase && ! normalize ) 
15 Sprint f(stderr,"\nWARNING: Make sure structures in hitlist are in aromatic and 

standardized form when using -arom\n\n" ); 
#if 0 

if ( Options [REGIONJDFFSET] . explicit ) 
{ 

20_ tptr = *((char **) Options [REGIONOFFSET] .value); 

if ( !strcmp(tptr, M normal" ) ) 

Jf regionMode = 0; 

Jf! else if ( !strcmp(tptr/big M ) ) 

:!f regionMode = 1; 

25*1 else if ( !strcmp(tptr, ,f huge" ) ) 

% regionMode = 2; 

22 else 

f { 

fprintf(stderr,"not a valid region mode: %s\n", tptr ); 
3(fS return -1; 

o } 
m } 

f^l #endif 

35 if ( stepSize < 1.5 1 1 stepSize > 2.5 ) 

{ 

fprintf(stderr," You must be kidding on this stepsize. Please keep between 1.5 and 2.5 An" 

); 

} 

40 #if 0 

TOP_STER_REGION_MODE(regionMode); 

#endif 
#if 0 

if ( rmode ! = ReportTSV ) 
45 { 

fprintf(stderr," other report options not supported, see -debugFile \n"); 
Q>rintf(stderr,"What formatting options do you want? \n" ); 
goto bailout; 
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} 

#endif 

if ( ! Query FileName && rmode != ReportMatrix ) 
5 { 

fprintf(stderr,"No query file specifiedAn"); 
return -1; 

} 

10 qfp = (FILE *) 0; 

if ( rmode ! = ReportMatrix ) 
{ 

qfp = fopen(Query FileName, n r ,f ); 
if(!qfp) 
15 { 

fprintf(stderr, "Failed to open query file:%s\n", QueryFileName ); 
return -1; 

} 

} 

20 

f : = if ( debugFileName ) 

I ( 

ffi dfp = fopen(debugFileName,"w"); 

riu if (dfp) 

m { 

£ fprintf(dfp,"#SYBYL/3DB HITLIST\n#@CLASS STRLIST\n"); 

4S fprintf(dfp,"#@FIELDTS_SID INT\n"); 

03 fprintf(dfp,"#@FIELDTS_QID INT\n"); 
} 

30=3 } 

O if ( str_feature Weights ) 

fU myFeatureWeights = parseFeatureWeights(str_featureWeights); 

O else 
35^ myFeatureWeights = (double *) 0; 

in_fp = open_input_source(Unity Database, UnitySetName, hitlist, &isPipe ); 
if ( !in_fp) 

{ 

40 return -1; 

} 

if ( OutputFileName ) 

{ 

45 outfp = fopen(OutputFileName, "w"); 

if ( loutfp ) 
{ 

fprintf(stderr, "Failed to open %s for output\n", OutputFileName ); 

40 


goto bailout; 

} 

} 

else 

outfp = stdout; 


keepCts = 0; 

if ( rmode = = ReportDetail ) 
10 keepCts = 1; 

if ( rmode = = ReportDetail j | rmode = = ReportSln ) 

{ 

writeDetailHeader(outfp, rmode ); 

} 

15 else if (rmode = = ReportTSVD ) 

wr iteTS VDHeader(outfp) ; 
else if (rmode = = ReportTSV) 

fprintf(outfp, "TOPSIM\n"); 

20 qct = (struct CtConnectionTable *) 0; 

p while ( qfp && !qct && UTL_SC AN_GETS(qfp , "\\", (char *) 0, &sln ) > 0 ) 

m { 

m if ( *sln =='#') 

ft! continue; 
25yl qct = DBIMPORTSLN(sln) ; 

if (qct) 

queryHevCount - TOP HEV COUNT(qct); 
3CM 

Q if ( qfp && !qct ) 

B < 

!*f fprintf(stderr ; "No query contained in :%s\n", Query FileName ); 

35^ bailout: 

if ( isPipe ) 

pclose(in_fp); 
return -1; 

} 

40 

if ( rmode = = ReportCore ) 
{ 

core_qct = qct; 

qct = (struct CtConnectionTable *) 0; 

45 } 

if ( TOPJ2UERYj3PTIONS(qct, do2piece, do3piece, doSubset, min_atoms, stericPivot, 
partialMatch, 

AllowTerminal Atoms, useFallback, hevDiff, 0, reductionFactor, featureFactor, 


4s: 
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attachmentFactor, stepSize, 

featureSet, useFeatureCharges, my Feature Weights, extraFeaturePenalty, dfp, 

debugLevel) && qct ) 

{ 

fprintf(stderr, "Failed to setup topomer searching for query. \n"); 
Q)rintf(stderr, M Most likely no 3D coordinates or cannot split query. \n"); 
goto bailout; 

} 

if ( rmode = = ReportCore ) 

{ 

DoCoreSearching(core_qct, in_fp, outfp ); 
qct = core_qct; 
goto closeup; 

} 

if ( rmode = = ReportMatrix ) 

{ 

DoMatrixSearching(in_fp, outfp); 
goto closeup; 

} 

if ( qct && queryDetailFileName ) 
{ 

writeQueryDetails(queryDetailFileName); 

} 

#ifdef CACHECOUNTERS 
eO = 1; 

el = 25; /* 26 L2 data cache, 25 LI data cache, see perfex */ 

start_counters(eO, el); 

#endif 

while ( UTL_SCAN_GETS(in_fp , "\\", (char *) 0, &sln ) > 0 ) 
{ 

if ( *sln =='#') 
{ 

if ( rmode = = ReportDetail && echo__hitlistLine(sln) ) 

DB_CT_SLN_WRITE(out^, sin ); 
continue; 

} 

cnt+ + ; 

ct = (struct CtConnectionTable *) 0; 

if ( hevDiff > = 0 ) 

{ 

strHevCount = slnHevCount(sln); 
strHevDiff = queryHevCount - strHevCount; 
if ( strHevDiff < 0 ) 

strHevDiff *= -1; 

if ( strHevDiff > hevDiff j | strHevCount < minHev | | strHevCount > 
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maxHev ) 

hevFiltered 4- + ; 

else 

ct = DBJMPORTSLN(sln); 

} 

else 

ct = DBIMPORTSLN(sln); 

if ( !(cnt % 1000) ) 
{ 

#ifdef CACHECOUNTERS 

read_counters(eO, &c0, el, &cl ); 
start_counters(eO, el); 

fprintf(stderr, "cache miss rate: %8.31f\n", (double) ( ( (long double) cl / (long 
double) cO ) ) * 10000.0 ); 
#endif 

#ifdef TRIPOS_VERSION 

TOP_GET_STATS(!(cnt % 10000), &t_frags, &t_2compare, &t_3compare, 
&t_fcompare, &t_filtered, &t_feat, &outsidePerc); 
#else 

TOP_GET_STATS(0, &t_frags, &t_2compare, &t_3compare, &t_fcompare, 
&t_filtered, &t_feat, &outsidePerc); 
#endif 

#if 0 

if ( outsidePerc > 10.0 ) 
{ 

fprintf(stderr, "Warning %8.41f percent of the fields evaluated have atoms 
outside the field, try using a larger fieldAn", 

outsidePerc ); 

} 

#endif 

time(&tnow); 

fprintf(stderr,"hit %3d of %4d filtered %4d (%d+ %d+ %d+ %d, 
No3D+Mix+Hev+Feat) out:%6.31f Avg Frags: % 7. 21f & Comparisons: %7.21f %s'\ 

nhit, cnt, noCordCnt + mixtures + hevFiltered + t_feat> noCordCnt, 
mixtures, hevFiltered, t_feat, outsidePerc, 

(double) tjrags / (double) cnt, (double) t_fcompare / (double) cnt, 

ctime(&tnow) ); 

#if 0 

fjprintf(stderr, "completed: %d no3D: %d mixtures: %d frags: %d comparisons: 
%d %d %d %8.41f %8.41f %8.41f %8.41f\n", 

cnt, noCordCnt, mixtures, 

t_frags, t_2compare, t_3compare, t_fcompare, 

(double) t_frags / (double) cnt, 

(double) t_2compare / (double) cnt, 

(double) t_3 compare / (double) cnt, 

(double) t_fcompare / (double) cnt ); 
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#endif 

} 

if ( let ) 

continue; 

5 cord = (double *) 0; 

DB_CT_GET_CT_ATTR(ct, CtQ3DCoordSet, &cord, &natoras ); 

if ( !cord ) 

{ 

DB_CT_DELETE_CT(ct) ; 
10 if ( dfp ) 

fprintf(dfp, "# compound %d missing cordinates\n", cnt ); 
noCordCnt+ + ; 
continue; 

} 

15 DB_CT_UTL_COUNT_FRAGS(ct, 0, (int *) 0, 0, (int *) 0, &nParts ); 

if ( nParts ! = 1 ) 

{ 

DB_CT_DELETE_CT(ct); 

mixtures + + ; 
20 continue; 
O } 

80 if ( normalize ) 

ry { 

25M DBCTNORMAROM(ct) ; 

=P DBCTST AND ARD(ct , (int *) 0); 

UTL_ERROR_CLEAR(); 

303 if ( max_hits > 0 ) 

:E { 

y res = TOP_COMPARE_WDETAIL(ct, radius, cnt,keepQs); 

^ if (res) 

D { 
35™ nhit+ + ; 

saveResult(res, maxjiits, &radius ); 

} 

else 

DBCTDELETECT(ct) ; 

40 } 

else if ( rmode = = ReportDetail | | rmode = = ReportTSVD | j rmode = = ReportSln 

) 

{ 

res = TOP_COMPARE_WDETAIL(ct, radius, cnt, keepCts ); 
45 if ( res ) 

{ 

nhit+ + ; 

if ( rmode = = ReportDetail ) 


44 


formatDetail(outfp, res, 1 ); 
else if ( rmode = = ReportSln ) 

formatDetail(outfp, res, 0 ); 

else 

formatTSVD(outfp,res); 
TOP_FREE_RESULT(res , 1); 

} 

DBCTDELETECT(ct) ; 


} 

10 else 


{ 

comfa_diff = TOP_COMPARE( ct, radius, &filtered, cnt ); 

if ( comfa_diff > = 0.0 && ( comfa_diff < = radius 1 1 radius < 0.0 ) ) 

{ 

15 nhit+ + ; 

if ( rmode = = ReportTSV) 

formatTSV(outfp, ct, comfa diff, cnt ); 
else /* if ( rmode = = ReportRegid ) */ 

formatRegid(outfp, ct, cnt ); 

20 } 
r- DBCTDELETECT(ct) ; 

} 

} 

#ifdef TRIPOS_VERSION 

25j TOP_GET_STATS(l , &t_frags, &t_2compare, &t_3compare, &t_fcompare, &t_filtered, &t_feat, 

£ &outsidePerc); 
_g #else 

Q3 TOP_GET_STATS(0, &t_frags, &t_2compare, &t_3compare, &t_fcompare, &t_filtered, &t_feat, 

= &outsidePerc); 
3© #endif 
£ time(&tnow); 

3 fprintf(stderr,"hit %3d of %4d filtered %4d (%d+ %d+ %d+ %d, 

Fy No3D+ Mix +Hev+ Feat) out:%6.31f Avg Frags: %7.21f& Comparisons: %7.21f %s", 
O nhit, cnt, noCordCnt + mixtures + hevFiltered + t_feat, noCordCnt, mixtures, 

3$^ hevFiltered, t_feat, outsidePerc, 

(double) t_frags / (double) cnt, (double) t_fcompare / (double) cnt, 
ctime(&tnow) ); 
if ( max_hits > 0 ) 
{ 

40 if ( result_idx > 1 && result_idx ! = max_hits ) 

qsort( (void *) result_root, (size_t) result_idx, (size_t) sizeof(top_result *) , 
top__result_compare ); 
for ( i = 0; i < max_hits && i < result_idx; i + + ) 

{ 

45 res = result_root[i]; 

if ( Ires ) 

continue; 
if ( rmode = = ReportTSV ) 

45 


formatTSV(outfp, res->ct, res- > comfajiiff , res->idx); 
else if ( rmode = = ReportTSVD ) 

formatTSVD(out^), res ); 
else if ( rmode - = ReportRegid ) 

formatRegid(outfp, res->ct, res->idx ); 
else if ( rmode = = ReportDetail ) 

formatDetail(outfp, res, 1 ); 
else if ( rmode = = ReportSln ) 

formatDetail(outfp, res, 0 ); 

} 

} 

for ( i = 0; i < res_alloc; i++ ) 
{ 

rptr = result_root[i]; 
if ( !rptr ) 

continue; 
if ( rptr->ct ) 

DB_CT_DELETE_CT(rptr- > ct); 
TOP_FREE_RESULT(rptr , 1); 
result j*oot[i] = (top_result *) 0; 

} 

closeup: 

if(qct) 

DBCTDELETECT(qct); 
if ( isPipe ) 

pclose(injp); 
else if ( in_fp ! = stdin ) 

fclose(in_fp); 
if(dfp) 

fclose(dip); 
if (outfjp ! = stdout ) 

fclose(outfp); 
if ( rmode ! = ReportMatrix ) 

dump_frag_stats() ; 
return 0; 

} 


static FILE *open_input_source(char *unitydb, char *setname, char *hitlist, int *r_ispipe ) 
{ 

char ^command; 
int len; 
FILE *fp; 

if ( unitydb ) 

{ 

len = strlen(unitydb) + 128; 
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if ( setname ) 

len + = strlen(setname); 
command = malloc(len); 

5 if ( setname ) 

sprintf(command/'dbexport -database %s -use_set %s -query regid + coords 
-visual '*'", unitydb, setname ); 
else 

sprintf(command,"dbexport -database %s -query regid + coords -visual '*'", 

10 unitydb ); 

fp = popen(command,"r n ); 
if ( !fp ) 

fprintf(stderr,'Tailed to start the command :\n%s\n", command ); 

15 else 

*r_ispipe = 1; 
free(command); 
return fp; 

} 

20 if ( hitlist && strcmp(hitlist,"-") ) 

5 fp = fopen(hitlist,"r"); 

CO if(!fp) 

ry fprintf(stderr, "Failed to open the hitlist: %s\n", hitlist ); 

25J1 *r_ispipe = 0; 

J5 return fp; 

I ) 

m *r_ispipe = 0; 

* return stdin; 

3(P } 


3f* static int top_result_compare(const void *vnrec, const void *vtrec ) 

{ 

top_result **n = (top_result **) vnrec; 
topjresult **t = (topjresult **) vtrec; 
double cdiff; 

40 

cdiff = (*n)->comfa_diff - (*t)->comfa_diff; 
if ( cdiff > 0.0 ) 
return 1; 
else if ( cdiff < 0.0 ) 
45 return -1; 

return (*t)->idx - (*n)->idx; 
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static void saveResult(top_resuIt *res, int maxjiits, double *r_radius ) 

{ 

static int resjnax; 
topjresult *rptr; 
5 int i; 

static char *suffix[] = { "th", "st", "nd", "rcT }; 
int sidx; 


if ( ! result jroot ) 
10 { 

res_max = max_hits; 

res_alloc = maxjiits + 5 + max_hits / 10; /* a little extra */ 
result_root = (top_result **) calloc(res_alloc, sizeof(top_result *) ); 

} 
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if ( res ) 
{ 


result jroot[result_idx] = res; 
result_idx + + ; 
20 if ( result jdx = = res_alloc ) 

yg qsort( (void *) result jroot, (sizej) res_alloc, (sizej) sizeof(top_result *) 5 

Cg top_result_compare ); 

m for ( i = resjnax; i < res_alloc; i++ ) 

2iH { 

42 rptr = result_root[i]; 

JE if ( !rptr ) 

OS continue; 

s if(rptr->ct) 
3d3 DB__CT_DELETE_CT(rptr- > ct) ; 

f TOP_FREE_RESULT(rptr, 1); 

result_root[i] = (top_result *) 0; 

ffi } 

35^ resultjdx = res_max; /* start finding a few more to add in */ 

rptr = result_root[res_max-l]; 

if ( *r_radius && *r__radius > 0.0 && rptr- > comfa_dif f < *r_radius ) 

{ 

sidx = 0; 

40 if ( resjnax < 4 ) 

sidx = resjnax; 

fprintf(stderr, " %d%s lowest shape distance: %8.21f old: %8.21f after: %d 

\n\ 

res_max, suffix[ sidx ], 
45 rptr->comfa_diff, *r_radius, cnt ); 

*r_radius = rptr->comfa_diff; 
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} 

static void setAttr(struct CtConnectionTable *ct, char *name, char *value ) 
5 { 

char *tval; 

tval = (char *) 0; 

10 DB_CT_GET_CT_ATTR(ct, CtCtUserValue, &tval, name ); 

if ( tval ) 

DB_CT_UTL_MOD_SIMPLE_CT_ATTR(ct, CtCtUserValue, value, name ); 

else 

DB_CT_SET_CT_ATTR(ct , CtCtUserValue, value, name ); 
15 UTL_ERROR_CLE AR() ; 


static int formatDetail(FILE *fp, top_result *res, int reporthitFrags ) 

20 { 

Q char name[40]; 

J3 char value[40]; 

m int i; 

fy int noSub; 

2SJ1 struct CtConnectionTable *ct; 


if ( !fp |i Ires ] j !res->ct) 
return -1; 

ct = res->ct; 

sprintf(value,"%d", (int) res- > comfa_diff ); 
setAttr(ct,"TOPSIM", value ); 

sprintf(value,"%d", (int) res->best2 ); 
setAttr(ct,"TS_2P", value ); 
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35^ 


sprintf(value,"%d", (int) res->best3 ); 
40 setAttr(ct, "TS_3P", value ); 

if ( doSubset ) 
{ 

sprintf(value,"%d", (int) res->bestSub ); 
45 setAttr(ct, "TS_SUBSET", value ); 

} 

if ( res->best3 < res->best2 ) 
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noSub = 3; 

else 

noSub = 2; 

5 if ( IreporthitFrags ) 

{ 

for ( i = 0; i < 3; i++ ) 
{ 

sprintf(value,"%d", res->qids[i] + 1 ); 
10 sprintf(name/TS_QID%d\ i-hl ); 

setAttr(ct, name, value ); 

sprintf(value,"%d", res->strids[i] +1 ); 
sprintf(name ) M TS_SID%d n > i + 1 ); 
1 5 set Att r (ct , name , value) ; 

} 

for ( i = 0; i < noSub; i++ ) 
{ 

sprintf(value,"%8.41f", res->hexDiffs[i] ); 
20 sprintf(name,"TS_S%d", i+1 ); 

Q setAttr(ct,name,value); 

03 } 

IS for ( i = 0; i < noSub; i+ + ) 

ru { 

25J1 sprintf(value, " % 8 .41f " , res- > featureDiffs[i] ); 

sprintf(name,"TS_F%d", i+1 ); 
set Attr(ct , name, value) ; 

m } 

L } 

30^; if ( res->attachmentPenalty != 0.0 ) 

y sprintf(value,"%8.31f", res- > attachmentPenalty ); 

?i setAttr(ct, " TS ATTACH PEN " , value ); 

U } 

35 = DB_CT_WRITE(fp, ct ); 

if ( reporthitFrags ) 

{ 

for ( i = 0; i < noSub; i++ ) 
40 { 

ct = res->strFrags[i]; 
if ( !ct ) 

continue; 

sprintf(value,"%8.41f", res- > hexDiffs[i] ); 
45 setAttr(ct, "TS_STERIC", value ); 

sprintf(value,"%8.41f" ( res->featureDiffs[i] ); 
setAttr(ct, "TS FEATURE", value ); 
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sprintf(value,"%d", res->qids[i] + 1); 
setAttr(ct, "TS QID", value ); 

sprintf(value,"%d", res->strids[i] + 1 ); 
setAttr(ct, "TS_SID", value ); 

sprintf(value,"%d", res- > outside[i] ); 
setAttr(ct, "TS_OUTR", value ); 


10 DB_CT_WRITE(fp, ct ); 

} 

} 

return 0; 

} 

15 

static void formatTSV(FILE *fp, struct CtConnectionTable *ct, double comfa diff, int idx) 
{ 

char *regid; 

20 regid = (char *) 0; 

0 if (ct) 

y3 { 

CO DB_CT_GET_CT_ATTR(ct, QCtRegld, &regid ); 

fy if ( ! regid ) 

25M DB_CT_GET_CT_ATTR(ct , CtCtName, &regid ); 

=P if ( regid ) 

03 fprintf(fp, "%s\t%d\n", regid, (int) comfa_diff ); 

= else 

3<D fprintf(fp, "Str%d\t%d\n", idx, (int) comfa_diff ); 

1 } 

^ static void formatRegid(FILE *fp, struct CtConnectionTable *ct, int idx) 

a { 

char *regid; 

regid = (char *) 0; 
if(ct) 

DB_CT_GET_CT_ATTR(ct , QCtRegld, &regid ); /* Don't get name, only regid */ 
40 if ( regid ) 

fprintf(fp, "%s\n", regid); 

else 

fprintf(fp, "Str%d\n", idx); 


} 


static void formatTSVD(FILE *fp, top_result *res ) 
{ 

char *regid; 
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char tmpname[20]; 

regid = (char *) 0; 
if ( res- > ct ) 

{ 

DB_CT_GET_CT_ATTR(res- > ct, QQRegld, &regid ); 
if ( ! regid ) 

DB_CT_GET_CT_ATTR(res- > ct, CtQName, &regid ); 

} 

if ( ! regid ) 

{ 

sprintf(tmpname,"Str%d ,r ; res->idx ); 
regid = tmpname; 

} 

if ( doSubset ) 

f p r i n t f ( f p 
,r %s\t%d\t%d\t%d\t%d\t%8.41f\t%8.41f\t%8.41f\t%8.4If\t%8.41f\t%8.41f\t% 
regid, 

(int) res- > comfa_diff , (int) res->best2, (int) res->best3, (int) res->bestSub, 
res->hexDiffs[0], res->hexDiffs[l], res->hexDiffs[2], 
res- > attachmentPenalty , 

res->featureDiffs[0], res->featureDiffs[l], res->featureDiffs[2] ); 

else 

f p r i n t f ( f p 
"%s\t%d\t%d\t%d\t%841f\t%84^ 
regid, 

(int) res- > comfa_diff , (int) res->best2, (int) res->best3, 
res->hexDiffs[0], res->hexDiffs[l], res->hexDiffs[2], 
res- > attachmentPenalty, 

res->featureDiffs[0], res->featureDiffs[l], res->featureDiffs[2] ); 


static void writeDetailHeader(FILE *fp, ReportMode rmode) 


time(&tnow); 

fprintf(fp, "#SYBYL/3DB HITLIS T\n#\n " ) ; 
fyrintf(ff>, M # Created: %s\ ctime(&tnow) ); 
fprintf(Q> , "#\n#@CLASS STRLIST\n#\n"); 

fprintf(fp, "#@FIELD TOPSIM\tINT\n n ); 
fprintf(fp, "#@FIELD TS_2P\tINT\n"); 
fprintf(fp, "#@FIELD TS_3P\tINT\n"); 
if ( doSubset ) 

fprintf(fp, "#@FIELD TS_SUBSET\tINT\n " ) ; 
if ( rmode = = ReportDetail ) 

{ 
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fprintf(fp, "#@FIELD TS_STERIC\tDOUBLE\n" ) ; 
fpr intf(fp , "#@FIELD TS_FEATURE\tDOUBLE\n"); 
fprintf(fp , "#@FIELD TS_QID\tINT\n"); 
fprintf(fp, "#@FIELD TS_SID\tINT\n"); 
fprintf(fp , "#@FIELD TS_OUTR\tINT\n"); 

} 

else 

{ 

fprintf(fp,"#@FIELD TS_Sl\tDOUBLE\n"); 
fprintf(fp, "#@FIELD TS_S2\tD0UBLE\n"); 
fprintf(fp, "#@FIELD TS_S3\tD0UBLE\n"); 
fprintf(fp, "#@FIELD TS F 1 \tDOUBLE\n " ) ; 
fprintf(fp, "#@FIELD TS_F2\tD0UBLE\n" ) ; 
fprintf(fp, "#@FIELD TS_F3\tD0UBLE\n ") ; 
fprintf(fp,"#@FIELDTS_QIDl\tINT\n"); 
fprintf(fp,"#@FIELDTS_SIDl\tINT\n"); 
fprintf(fp, "#@FIELD TS_QID2\tINT\n"); 
fprintf(fp, "#@FIELD TS_SID2\tINT\n"); 
fprintf(fp, "#@FIELD TS_QID3\tINT\n"); 
fprintf(fp, "#@FIELD TS_SID3\tINT\n"); 

} 

fprintf(fp, "#@FIELD TS_ATT ACH_PEN\tDOUBLE\n" ) ; 

} 

static void writeTSVDHeader(FILE *fp) 
{ 

if ( doSubset ) 

fprintf(fp, n TOPSIM\tTS_2P\tTS_3P\tTS_SUBSET\tTS_Sl\tTS_S2\tTS_S3\tTS_ATTACH_PEN\tFS_F 
l\tFS_F2\tFS_F3\n" ); 
else 

fprintf(fp,"TOPSIM\tTS_2P\tTS_3P\tTS_Sl\tTS_S2\tTS_S3\tTS_ATTACH_PEN\tFS_Fl\tFS_F2\tFS 

_F3\n" ); 

} 

static int echo_hitlistLine(char *line) 

{ 

char *tptr; 

static char *keepjields[] = { "FIELD", "DATABASE", "QUERY" , "CORE", (char *) 0 }; 
int i; 

if ( *line != | | *(line+l) != '@' ) 
return 0; 

tptr = line +2; 
if ( !*tptr ) 

return 0; 
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for ( i = 0; keep_fields[i]; i+ + ) 
{ 

if ( !strncmp(tptr,keep_fields[i], strlen(keep_fields[i] ) ) ) 
return 1; 

} 

return 0; 

} 


static void writeQueryDetails(char *fname ) 
{ 

time J; tnow; 
FILE *fp; 

fp = fopen(fname,"w M ); 

if(!fp) 

{ 

fprintf(stderr, M Unable to write to query detail filename: %s\n", fname ); 
return; 

} 

time(&tnow); 

fprintf(fp, "#SYBYL/3DB HITLIST\n#\n"); 
fprintf(fp,''# Created: %s", ctime(&tnow) ); 
fprintf(fp, "#\n#@CLASS STRLIST\n#\n"); 

fprintf(fp, "#@FIELD TS_QID\tINT\n"); 

TOP_QUERY_DUMP(fp, "TS QID"); 
fclose(fp); 

} 

static int slnHevCount(char *sln) 
{ 

char ^tptr; 

int inbrace = 0; 

int hevCount = 0; 

tptr = sin; 

while (*tptr) 

{ 

if ( *tptr =='[') 
{ 

while (*tptr && *tptr != ']' ) 
{ 
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if ( *tptr == "" ) 
{ 

tptr+ + ; 

while (*tptr && *tptr != "" ) 
5 tptr+ + ; 

if (*tptr) 

tptr+ + ; 

} 

else 

10 tptr+ + ; 

} 

} 

if ( isupper(*tptr) && *tptr ! = 'H' ) 

hevCount+ + ; 
15 if ( *tptr =='<') 

return hevCount; 
tptr+ + ; 

} 

return hevCount; 

20 } 

yi static double *parseFeatureWeights(char *sptr ) 

fy static double weights [6]; 

25J1 char *tokens[7]; 

=G int ntoks; 

=f int i; 


3<P 
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ntoks = token_string(sptr, V, 6, 0, tokens ); 


Hh if ( ntoks ! = 5 ) 

S { 

^= fprintf(stderr, "Invalid argument to -weights, please specify 5 weights for 

p arom,neg,pos,HBA,HBD \ n "); 
35^ exit(-l); 
} 

for ( i = 0; i < 5; i++ ) 
{ 

weights [i] = atof(tokens[i]); 
40 if ( weightsfi] < 0.0) 

weights[i] = 0.0; 

} 

return weights; 

} 


/* returns the number of tokens found 
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The string str will be modified, tokens will be modified to the null character 

*/ 

int token_string(char *str, char token, int maxtoks, int skipMult, char **tokens ) 

{ 

5 char *tptr; 

int ntoks; 
int len, idx; 
int intok = 0; 

10 for ( len = 0, tptr = str; *tptr; tptr+ + , len+ + ) 

{ 

if ( *tptr = = token ) 
*tptr = '\0'; 

} 

15 ntoks = idx = 0; 

tptr = str; 

if ( IskipMult ) 
{ 

20 tokens[0] = str; 

O ntoks = 1; 

fU while (ntoks < maxtoks && idx < len ) 

25J { 
=E if ( skipMult ) 

=F { 

yy if ( *tptr ) 

L { 
3<P if (! intok) 

f ( 

J^jj tokens [ntoks ++] = tptr; 

iy intok = 1; 

} 


35* } 

else 

} 

else 

40 { 


intok = 0; 


} 

idx+ + ; 

45 tptr+-h; 
} 

return ntoks; 


if ( *tptr = = '\0' ) 

tokens [ntoks ++] = tptr + 1 ; 
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static int DoCoreSearching( struct CtConnectionTable * qct, FILE *infp, FILE *outfp ) 

{ 

int cnt = 0; 
5 int nhit = 0; 

double *cord; 
char *sln; 

struct CtConnectionTable *ct; 
top_result *res; 
10 int natoms; 

int nParts; 
int hasCore; 
int err; 

static FILE *corefp; 
15 static int reportCores = -1; 

char *regid; 

if ( reportCores == -1 ) 

{ 

20 reportCores = 0; 

0 if ( (sin = getenv("DBTOP_CORES") ) ) 

S ( 

yy corefp = fopen(sln,"w"); 

|y if ( ! corefp ) 

2Sy fprintf(stderr," Failed to open %s to report the core regids\n", sin ); 

4* else 

£ { 

^ reportCores = 1; 

rprintf(stderr, "Writing the regid for each structure with a core to %s\n" 


3©2 sin ); 


} 

} 

} 


35 time(&tnow); 


fprintf(outfp, "#SYBYL/3DB HITLIST\n#\n"); 
fprintf(outfp,"# Created: %s", ctime(&tnow) ); 
fprintf(outfp, "#\n#@CLASS STRLIST\n#\n"); 

40 

fprintf(outfp, "#@FIELD CORESIM\tINT\n"); 
fprintf(outfp, "#@FIELD TS_UNIQ_ID\tINT\n"); 
fprintf(outrp, "#@FIELD TS_HIT_ID\tINT\n"); 
fprintf(outfp, "#@FIELD TS_ATTACH_PEN\tINT\n " ) ; 
45 fprintf(outfp,"#@FIELD TS_FEATURE\tINT\n" ) ; 

fprintf(outfp, "#@FIELD TS_STERIC\tINT\n"); 
fprintf(outfp, "#@FIELD TS_QID\tINT\n"); 
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err = TOP_CORE_QUERY(qct, outfp); 
if ( err ) 

return err; 

5 while ( UTL_SCAN_GETS(infp, "W", (char *) 0, &sln ) > 0 ) 

{ 

if ( *sln =='#') 
continue; 

cnt+ + ; 

10 UTL_ERROR_CLEAR(); 

ct = DBIMPORTSLN(sln) ; 

if ( !(cnt % 1000) ) 
{ 

15 time(&tnow); 

fprintf(stderr, n core searching hit %3d of %4d %s", nhit, cnt, ctime(&tnow) ); 

} 

if ( !ct ) 

continue; 

20 cord = (double *) 0; 

O DB_CT_GET_CT_ATTR(ct , CtQ3DCoordSet, &cord, &natoms ); 

y3 if ( !cord ) 

fy DB_CT_DELETE_CT(ct); 
2lH continue; 

4= } 

==F DB_CT_UTL_COUNT_FRAGS(ct, 0, (int *) 0, 0, (int *) 0, &nParts ); 

W if ( nParts ! = 1 ) 

L { 
36>J DBCTDELETECT(ct) ; 

^ continue; 

S } 

-~ if ( normalize ) 

3^ { 

DBCTNORMAROM(ct) ; 

DB CT ST AND ARD(ct , (int *) 0); 

} 

DB_CT_UTL_FIND_RINGS(ct) ; 
40 UTL_ERROR_CLEAR(); 

regid = (char *) 0; 

DB_CT_GET_CT_ATTR(ct, CtQRegld, &regid ); 
if ( ! regid ) 

DB_CT_GET_CT_ATTR(ct, CtCtName, &regid ); 


45 


res = TOP_CORE_SEARCH(ct, radius, max attachpen, &hasCore ); 
if ( corefp && hasCore ) 

{ 
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regid = (char *) 0; 

DB_CT_GET_CT_ATTR(ct , QCtRegld, &regid ); 
if ( ! regid ) 

DB_CT_GET_CT_ATTR(ct, CtQName, &regid ); 

} 

if ( res ) 
{ 

DB_CT_WRITE(outfp, res- > strFrags[0] ); 
DB_CT_WRITE(outfp, res->strFrags[l] ); 
fflush(outfp); 
nhit+ + ; 

} 

DB_CT_DELETE_CT(ct) ; 

} 

time(&tnow); 

fprintf(stderr,"core searching hit %3d of %4d %s", nhit, cnt, ctime(&tnow) ); 
return 0; 

} 

static int DoMatrixSearching(FILE *infp, FILE *outfp ) 

{ 

char **slns; 
int alloc_slns; 
int nused; 
char *sln; 
int *matrix; 
int ij; 

int matrixSize; 

nused = 0; 
alloc_slns = 501; 

sins = (char **) calloc(alloc_slns, sizeof(char *) ); 

while ( UTLJSCAN_GETS(infp, "\\ M , (char *) 0, &sln ) > 0 ) 
{ 

if ( *sln = =*#') 
continue; 
if ( nused > = alloc_slns ) 

{ 

al!oc_slns *= 2; 

sins = (char **) realloc((char *) sins, alloc_slns * sizeof(char *) ); 

} 

sins [nused] = strdup(sln); 
nused+ + ; 

} 

matrix = TOP_MATRIX_SEARCH(slns, nused); 
if ( Imatrix ) 

return -1; 
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matrixSize = nused * nused; 

for (i = 0; i < matrixSize; i++ ) 

{ 

fprintf(outfp, " %d\n", matrix[i]); 

} 


APPENDIX "B" - CT TOP.H 


#define TRIPOS_VERSION 1 

typedef enum 
{ 

UseUnityFeatures , 
UsePreferredUnityFeatures, 
UseTopomerFeatures 
} FeatureSetName; 

typedef struct topresultdef 
{ 

struct CtConnectionTable *ct; /* is NOT FREED by TOPJFREERESULT, managed by caller 

*/ 

int idx; 

void *userdata; /* pointer to something else if needed */ 

int filtered; 

double comfa diff; 

double best2; /* best 2 piece hit */ 

double best3; /* best 3 piece hit */ 

double bestSub; /* best subset hit, when enabled */ 

int hit3Piece; /* if true a 3 piece fragment was hit */ 

struct CtConnectionTable *qFrags[3]; /* call TOP_FREE_RESULT to free memory, just pointers 

*/ 

struct CtConnectionTable *strFrags[3]; /* copies */ 
double hexDiffs[3]; 
double featureDiffs[3]; 

double attachmentPenalty; /* for 3 piece only */ 
int qids[3]; 
int strids[3]; 
int outside[3]; 
} top jresult; 

/* Topomer heterogenius searching functions. 

1st call TOP_QUERY_OPTIONS with the query ct 

2nd call TOP_COMPARE_WDETAIL or TOPCOMPARE to do a topomer comparison 

*/ 

/* only hits return a non nill pointer, use radius = -1.0 to return all results */ 
int TOP_QUERY_OPTIONS(struct CtConnectionTable *ct, int do2piece, int do3piece, int doSubset, int 
minatoms, int autoScale, int partialMatch, int terminalFlag, int fallbackFIag, int hevDiff, int filterFlag, 
double reductionFactor, double featureFactor, double attachmentFactor, double stepSize, FeatureSetName 
featureSet, int useFeatureCharges, double *feat_weights, double extraPenalty, FILE *queryfp, int 
debugLevel ); 

top_result *TOP_COMPARE_WDETAIL(struct CtConnectionTable *ct, double radius, int idx, int 
keepCts ); 

double TOP_COMPARE(struct CtConnectionTable *ct, double radius, int ^filtered, int idx ); 
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/* TOPCOMPARE is faster, but no detail is returned, only the comfa_diff, 

negative upon failure 

results are returned even if below radius */ 

void TOP_FREE_RESULT(top_result *res, int freeRef); 
void TOP_QUERY_DUMP(FILE *fp, char *id_fieldname ); 

int TOP_GETJSTATS(int dumpRegions, int *r_tfrags, int *r_2compare, int *r_3compare, int 
*r_fcompare, int *r_filtered, int *r_feat, double *r_outsidePerc ); 
int TOP_HEV_COUNT(struct CtConnectionTable *ct); 

top_result *TOP_CORE_SEARCH(struct CtConnectionTable *ct, double radius, double maxattachpen, 
int *r_hascore ); 

int TOP_CORE_QUERY( struct CtConnectionTable *ct, FILE *fp); 
int *TOP_MATRIX_SEARCH(char **slns, int numSlns ); 
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#include <stdio.h> 
#include <stdlib.h> 
#include <stdarg.h> 
#include <ctype.h> 
#include <string.h> 
#include <malloc.h> 

#include "ct.h" 
#include "import_proto.h" 
#include "ct_int.h" 
#include "ct_proto.h" 
#include "srch2_proto.h" 

#include "utl_mem.h" 
#include "utl_str.h" 
#include "utl_error.h" 
#include "set.h" 

#include "utl_geom.h" 
#include "utl set.h" 
#include "comfa.h" 
#include "ctjop.h" 


#ifhdef TRUE 
#define TRUE 1 
#endif 

#define SPLIT DEBUG 1 

/* 

#defme DEBUG_VALID_B 

#defme HEV_STATS 1 

#defme CALCBATCHJDIFF 1 

#define USE_HEX 1 

#define STD.REGION 1 

#define NO_COMPRESSION 1 

#defme NUMBER_OF_COMPRESSION_FIELDS 5 

#define NOSTRMAP 1 

#define DEBUGDETAIL 1 

*/ 

#define MAX_FEATURES 200 


#ifdef NUMBER_OF_COMPRESSION_FIELDS 

#define COMPRESSION_POINTS NUMBER_OF_COMPRESSION_FIELDS 
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#else 

#define COMPRESSIONPOINTS 0 
#endif 


#define NO_REGIONS 11 
static int max regions; 
static double qxmin = 999.0; 
static double qymin = 999.0; 
static double qzmin = 999.0; 
static double qxmax = -999.0; 
static double qymax = -999.0; 
static double qzmax = -999.0; 

static double aggreg_descale = 0.85; 

struct bond_detail_rec { 

set_ptr to_atts; /* if this is a topomerically labile bond, 

points to set of atoms in fragment rooted at "to" */ 
int best[3]; /* " " , ordered best three attachments to the "to" atom */ 
int identical[2]; /* " " , TRUE if n'th and n-l'th sttachments are identical */ 

int natlvs2[2]; /* " " , difference, in # atoms, between n'th and n-l'th attachment */ 
int lastnat[2]; /* " " , # ats in n-l'th attachment */ 

}; 

struct bondjop jrec { 

int from, to; /* end atom IDs */ 

struct bonddetailjrec *detail; /* FALSE if bond is not topomerically labile */ 

}; 

struct topgraph { 

int maxatoms, maxbonds; /* allocated maximum values */ 
int natoms, nbonds; 

int *bstart; /* pointers to first bond top rec for each atom */ 
struct bond _top_rec *bstuff; 

}; 

typedef struct aromset def { 

int numAtoms; 

int *atoms; 
} AromSet; 

typedef struct frag_def { 
int baseAtom; 

int copyBaseAtom; /* baseAtom is from the Original ct, copyBaseAtom references this ct, the 
fragment */ 

int atomCnt; 
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int hevCnt; 
int aromCnt; 
int id; 
int outside; 

5 int npoints; /* number of points in this region, sizeof topField */ 

int regionldx; /* which region to use, deterines size of *topField */ 
int *atoms; 

struct CtConnectionTable *ct; 

double *cords; /* a pointer into the ct's cordinates, don't free */ 
10 double *topField; 

#ifdef STDREGION 

double *stdField; 
double *stdDiff; 

#endif 

15 #ifdef USEJIEX 

char *topHex; 

char *topInt; /* parsed string of ints , well chars valued 0-15*/ 
int topIntSize; 

#endif 

2<L double *AtWts; 

% double *hexDiff; /* sizeof number of fragments for comparing against current compound X */ 

JJf double *featureDiff; 

S double *feature2PDiff; 

j % double *feature3PDiff; 

2Mi double *featureSubsetDiff; 

"£ int *origMapping; /* Maps this ct's atoms into the ct into Split */ 

S double *cent; /* aromCnt * 4, x, y, z, and attrition factor is the 4th double */ 

3* double outsidePenalty; 

L double *qtf[NO_REGIONS] ; /* query topomer fields */ 

3jg } Frag; 

H typedef struct split2_def { 
H u& bondld; 

3l> int fragl; 

int frag2; 
int *bl; 
int *b2; 
#ifndef NOSTRMAP 
40 int *strMap; 

int *subsetMap; 
allocSubsetMap */ 
#endif 
} split2; 

45 

typedef struct split3__def { 
int bondl; 
int bond2; 
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/* size of number of 2 piece fragments in structure see alloc2Map */ 
/* size is the number of 3 piece fragmetns in the structure see 


int fragl; 
int frag2; 
int frag3; 
int frag4; 

5 int *bl; /* atoms, change to al,a2,a3 */ 

int *b2; 
int *b3; 
int *b4; 
#ifttdef NO_STRMAP 

10 int *strMap; /* size of number of 3 piece fragments in structure see alloc3Map */ 

#endif 
} split3; 


15 typedef struct split_def { 
split2 *s2; 
split3 *s3; 
Frag *frags; 

struct CtConnectionTable *ct; 
2% int s2cnt; 

% int s3cnt; 

;5 int numFrags; 

51 int atomCount; /* number of atoms in the ct */ 

int *atomMask; /* Which atoms are Hev atoms, and optionally not terminal atoms */ 

25p int bondCount; /* Number of bonds in the ct */ 

j~ int *bondMask; /* Bonds where splits occur */ 

m int *singleBonds; /* single bonds not in rings, and not to primary atoms, H,Cl,Br */ 

~'~ int numHev; /* number of heavy atoms in the ct */ 

g int *featureMask; /* array the size of atomCount. Mask representing if this atom is which 

3CP features. */ 

q int featureCnts[5]; /* total number of features, by type */ 

ry int *aromMask; /* for features, the atoms which hit one of the aromatic patterns */ 

O int numArom; 

M AromSet *aromSets; /* an array the size of numArom */ 

35 int fragsBuilt; 

int connectedHBTotalCnt; 

int *connectedHBCnt; /* size of atomCount. # of connected atoms which are HBA & HBD and 
atom is HBA or HBD */ 

int *connectedHB Atoms; /* size of atomCount * 5 */ 
40 #ifhdef NOSTRMAP 
int alloc2Map; 
int alloc3Map; 
int allocSubsetMap; 

#endif 
45 } Split; 

typedef struct branch_info_def { 
int toAtom; 
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int chainSize; 
double molWeight; 
} branchlnfo; 


5 typedef enum 

{ 

FeatureNone = 0x0, 
FeatureArom = 0x1, 
FeaturePos = 0x2, 
10 FeatureNeg = 0x4, 

FeatureHBA = 0x8, 
FeatureHBD = 0x10 
} FeatureType; 

15 static FeatureType fMasks[4] = { FeaturePos, FeatureNeg, FeatureHBA, FeatureHBD }; 
typedef struct feature_pat_def 

{ 

FeatureType f_type; 
2Q ;= , int weight; 

int atomicld; /* if non-zero this atomic id must be present, Nitrogen and Oxygen are the only 
2 ones checked for */ 

St int ringlndicator; /* if non-zero indicator if must be in ring, 1 is must be ring, -1 must not be 

in ring*/ 
23t char *sln; 

C struct CtConnectionTable *ct; 

A% void *pattern; 

7 } FeaturePattern; 

3C|t typedef struct { 

^ lpt lo[3], /* corner with lowest values for each axis */ 
fy hip], /* " " hi-est " */ 

q stepsize[3]; /* increment between points */ 

Ck int nstep[3], /* derived as 1 + (hi-lo + epsilon) / stepsize */ 
35 n; /* n = product of nstep[i] */ 

int atomjype; /* SYBYL atom type, for steric energy computation */ 
fpt pt_charge; /* elemental charge at point, for electrostatics */ 
fyt *weight; /* weight[n] is applied in all computations, e.g = 1 */ 
int avgtype; /* box of *scale\ sphere, sphere x vdw, ...? */ 
40 fpt avg_scale; /* scale whose meaning derived from avg type */ 
int arb, /* arbitrary int for later use */ 
*parb; /* " pointer " " */ 
} l^Box, *l_BoxPtr ; 

45 typedef struct { 

char *filename ; /* name of the region's file (if any) */ 

int njboxes; /* number of boxes which make up the region */ 

int n_points ; /* number of points in this region altogether */ 
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l_BoxPtr box_array; /* box_array[n_regions], each one a Box */ 
int n_refs ; /* number of CURRENT references to this memory */ 
long whenjnade; /* creation stamp */ 
} IComfaRegion, *l_RegionPtr ; 


typedef struct { 

unsigned int crc; 

char *sln; 

int hitcnt; 
} UniqSln; 


static l_ComfaRegion *regions[NO_REGIONS]; 
static int regionUseCnts[NOJREGIONS]; 
static l_RegionPtr stdRegion; 
static int minRegion; 
static int minRegion2P; 
static int minRegion3P; 


static int totjxags; 
static int tot__uniq_frags; 
static int compounds; 
static int searchCnt; 

static int t_2compare; 
static int t_3compare; 
static int t_fcompare; 
static int t_filtered; 
static int t_featFiltered; 
static int t_outside; 
static int t_fields; 


static int *g_atomDist; 

static struct CtConnectionTable *g_ct; 


static double def_featureWeights[6] = { 20.0, 200.0, 200.0, 100.0, 100.0 }; 
static double featureWeights[6] = { 20.0, 200.0, 200.0, 100.0, 100.0 }; 

/* Local prototypes */ 

struct top_graph *TOP_INIT_GRAPH( struct top_graph *g, struct CtConnectionTable *ct ); 
static void ashow( set_ptr aset ); 

static Split *FindBreakPoints(CtConnectionTable *ct, int minHev, int termflag, int createFrags ); 
static int *findDirectionalNeighbors(CtConnectionTable *ct, int atomldx, int terminalAtomldx, int 
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termldx2 ); 

static double *computeVdwWeighte(CtConnectionTable *ct, int atomldx, int terminalAtomldx, double 
reductionFactor, int **r_covered ); 

static int *createAtomMask(CtConnectionTable *ct, int termflag, int *r_hevCount); 
5 static int validBreakPoint(QConnectionTable *ct, int bondidx, int *atomMask, int minHev, int termflag, 
int **rbl, int **rb2 ); 

static int addSplit2(int bondld, int *bl, int *b2 ); 

static int addSplit3(int atomCnt, int bondl, int bond2, int *bl, int *b2, int *b3, int firstBase, int 
secondBase ); 
10 static void fteeSplit(Split *s); 

static void freeSplit2(split2 *s2, int cnt ); 
static void freeSplit3(split3 *s3, int cnt ); 
static void freeFrags(Frag *f, int cnt ); 
static void freeFragCts(Split *S); 
15 static int freeStrMap(Split *S); 

static int atomsOverlap(int atomcnt, int *bl, int *b2); 
static int hevCount(int atomcnt, int *b, int *atomMask, int *r_numAtoms ); 
static int createFrag(int atomCnt, int *atoms, int *atomMask, int checkDup ); 
static Frag *createUniqFrags(int atomCnt, split2 *s2, int nums2, split3 *s3, int nums3, int *atomMask, 
2% k int *r_numFrags ); 
H~ static int getAtomIds(CtConnectionTable *ct, int al, int *r_a2, int *r_a3 ); 
Is static double fieldHexDiff( char *cptr, char *cqtr, int nosq ); 
m static double CompareAllFeatures(Split *query, Split *str, double radius ); 

static double CompareTwoCompounds(Split *query, Split *str, double radius, int *r_qidx, int *r_sidx, 
25p int *r_splitidx, int *r_three, int *r_subsetHit, double *best2, double *best3, double *bestSub, double 
£ *r_atp, i nt bailedout ); 

j| char *CT_FIELD2HEX( double *field, int size ); 
T static char *hexStringToInts(char *cptr, int *r_size); 
m static double fieldIntDiff( char *cptr, char *cqtr, int si, int s2 ); 
3(E static double topFieldDiff(double *qry, double *str, int npoints ); 

q static double topFieldCompressedDiff(double *qry, double *str, int npoints, double startPenalty ); 
fij static double fieldIntDif£Sq( unsigned short *cptr, unsigned short *cqtr, int si, int s2); 
y static double *computePathWeights(struct CtConnectionTable *ct, int baseAtom, int *atomDist, int 
M *featureMask, int *ctMap ); 
35 static int getFromAtom(struct CtConnectionTable *ct, int *atomdist, double *molWeights, int atom, int 
toAtom, int baseAtom, double *cord ); 

static int debugHits( FILE *fp, Split *query, Split *str, int bestq, int bestStr, int bestldx, int threeMatched 

); 

static int topAlignCt(struct CtConnectionTable *ct, int baseAtom, int *featureMask, int *ctMapping ); 
40 static int traverseBranch( struct CtConnectionTable *ct, int atomld, int *atomdist, double *molweight, 
int rootToAtom, int *r_toatom, int *rjength, double *r_weight ); 

static int *findLargestBranch(struct CtConnectionTable *ct, int *atomdist, double ^weights ); 
static CtBond *getBond(struct CtConnectionTable *ct, int idl, int id2 ); 
static int setTorsion(double *coo, int nAtoms, int al, int a2, int a3, int a4, double value ); 
45 static int reflectAtoms( double *coo, int nAtoms, int npt, int *aplane ); 

static int setBaseTorsion(double *coo, int nAtoms, int a3, int a4, double value ); 

static int setRootTorsion(double *coo, int nAtoms, int a2, int a3, int a4, double value ); 

static int get_details( top_result *res, Split *query, Split *str, int bestq, int bestStr, int bestldx, int 
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threeMatched, int subsetHit, int keepCts ); 

static top_result *top_compare(struct CtConnectionTable *ct, double radius, int details, int idx, int 
keepCts ); 

static struct CtConnectionTable *makeFragCopy(struct CtConnectionTable *ct, int id, int hexdiff ); 
5 static void writeCopy(FILE *fp, struct CtConnectionTable *ct, int id, int hexdiff, char *fieldname ); 
static void setAttr (struct CtConnectionTable *ct, char *name, char *value ); 
static double computeAttachmentPenalty( Frag *qry, Frag *str, Frag *other_qry, Frag *other_str ); 
static FeaturePattern *InitFeaturePatterns(int *r_numPatterns); 
static int SearchForFeatures(Split *S); 
10 static int computeCentroid( double *cords, int *atoms, int numAtoms, double *r_x, double *r_j, double 
*r_z ); 

static void addCentroid(Frag *fptr, int natoms, double attrFact, double x, double y, double z ); 

static double compareFeatures(Split *qs, Frag *qry, Split *ss, Frag *str, int query2ndAttach, int 

str2ndAttach ); 

15 static double featureScaling(int *featureCnts, int *extraFeatureCnts, double *featureContributions, int 
nbest ); 

static int BuildTopomers(CtConnectionTable *ct, Split *S, Split *query); 
static int BuildFrags(Split *S); 

static int atomsOutside(double *coords, int natoms, I RegionPtr regp, double *atwts, double *r_outpen 
2^ ); 

% static int makeTopRegions(double stepSize, int numFrags); 

2j static IRegionPtr getRegionToUse(double *coords, int natoms, int *r_idx, int *n_points ); 
Srj static void getQueryExtents(double *coords, int atomCnt ); 

I Z static int getCordExtents(double *coords, int natoms, double *r_minx, double *r_miny, double *r_minz, 
2S~ double *r_maxx, double *r_maxy, double *r_maxz ); 

J static double *compressField(double *fptr, int npoints ); 

S static int compareFields(double *orig, double *atombased, int npoints ); 

7 static void stripCharge(struct CtConnectionTable *ct, CtAtom *aptr, int atomidx); 

static int dupCheckCore(struct CtConnectionTable *ct, int *rjmiqid, int *r_hitid ); 
3CF struct CtConnectionTable *getLargestFrag(struct CtConnectionTable *ct ); 

a static void CoverConnectedHB(Split *qs, struct CtConnectionTable *ct, double *HB); 

m static int double_compare(const void *vnrec, const void *vtrec ); 

f j static double MeasureClosest(Split *qs, Frag *ql, Split *str, Frag *fl, double *da, double *aa, int 
t.„ *r_nofeatures); 

35 static void PartialMatchFeatures(Split *qs, int mode, Frag *ql , Frag *q2, Frag *q3, Frag *q4, Split *str, 
Frag *fl, Frag *f2, Frag *f3, Frag *f4, int matchCnt ); 

static int makeSplit3(CtConnectionTable *ct, int *atomMask, split2 *sall, int cnt, int minHev ); 
static int getFromChiralAtoms(struct CtConnectionTable *ct, int *atomdist, double *molw, int atom, int 
toAtom, int *r_fromAtom, int *r_toatom); 
40 static int getFromRingCount(struct CtConnectionTable *ct, int *atomdist, int atom, int to Atom ); 
static double get_path_mw( set_ptr aset, struct CtConnectionTable *ct, double mw ); 

static split2 *g_split2; 
45 static int g_splitcnt; 

static int g_splitalloc; 

static split3 *g_split3; 
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static int g_split3Cnt; 
static int g_split3Alloc; 

static Frag *g_fragHead; 
5 static int gJragCnt; 
static int g_fragAUoc; 

static char *regid; 

10 

/* Query options */ 

static struct CtConnectionTable *q_ct; 

static double q_bailout; 

static FeatureSetName q_featureSet; 
15 static int q_useFeatureCharges; 

static double q_attachPenFactor = 100.0; 

static double q_featureFactor = 1.0; 

static double q_extraFeatureFactor = 0.1; 

static int q_minatoms; /* minimum HEV atoms per fragment */ 
2CU static int q_autoScale; /* automatic scaling of sensativity of neighbors based upon the query. */ 
^ static int q__partiaIMatch; /* partial match count for HBA and HBD */ 
S static double autoScaleFactor; /* steric auto scaling factor */ 

static int q_termFlag; /* if TRUE term atoms are counted */ 
: J* static int q_do2piece; /* if TRUE do 2 piece comparisons */ 
25; static int q_do3piece; /* if TRUE do 3 piece comparisons */ 
: p static int q_doSubset; /* if TRUE do subset comparisons, 2 piece query with 3 piece structure. Hit larger 
rig compounds */ 
J static int q_minSubsetSize =15; 
p static int q_matrixMode; 
30; static int q_coremode; 

□ static int q_coremode_align; 

flj static int q_Mlback; /* if TRUE fallback on minatoms to 3 and count terminal atoms */ 

□ static int q_hevDiff; /* maximum allowed hev atoms, inclusive */ 
h static int q_filter; /* if TRUE filtering is enabled */ 

35 static int q_regionMode; 

static double q_stepSize = 2.0; 

static double qL_ReductionFactor = 0.85; /* reduction factor */ 
static int q_debugLevel; 
static FILE *q_debugfp; 
40 static FILE *debug2; 

static Split *qs; /* query split structure & topomers */ 

static int qmode; 

45 #ifO 

int top_test_debug(char *ftiame) 

{ 

if ( debugfp ) 
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fclose(debug_fp); 
debugjp = (FILE *) 0; 
if ( fhame ) 

debug _fp = fopen(fhame,"w"); 
return 0; 

} 

#endif 


10 int TOP_QUERY_OPTIONS(struct CtConnectionTable *ct, 

int do2piece, int do3piece, int doSubset, int minatoms, int autoScaleSteric, int partialMatch, 

int terminalFlag, int fallbackFlag, int hevDiff, int filterFlag, 

double reductionFactor, double featureFactor, double attachmentFactor, 

double stepSize, FeatureSetNamefeatureSet, intuseFeatureCharges, double *feat_weights, double 

15 extraPenalty, 

FILE *debug_fp, int debugLevei ) 

{ 

int i; 

double *cord; 
2(L double *wptr; 

% int numSplits; 

Si if (ct && !DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &cord, &i)) 

;S { 

25t UTL_ERROR_CLEAR0; 
P return -1; 

s } 

" UTL_ERROR_CLEAR0; 

3C| if ( feat_weights ) 
fh . wptr = featweights; 

f|j else " 

p wptr = deffeatureWeights; 

35 if ( useFeatureCharges ) 

def_featureWeights[l] = def_featureWeights[2] = 0.0; 

else 

def_featureWeights[l] = def_featureWeights[2] = 200.0; 

40 for ( i = 0; i < 5; i+ + ) 

featureWeightsfi] = wptr[i]; 

qmode = 1; 
45 if ( ct ) 

{ 

DB_CT_NORM_AROM(ct); 
DB_CT_STANDARD(ct, (int *) 0); 

72 


DB_CT_UTL_FIND_RINGS (ct); 

} 

numSplits = 8; 
5 if ( minatoms < -1 ) 

{ 

fallbackFlag = numSplits = minatoms * -1; 
minatoms = ct->atomCount / 2; 

} 

10 q_featureSet = featureSet; 

q_useFeatureCharges = useFeatureCharges; 
q_extraFeatureFactor = extraPenalty; 
^minatoms = minatoms; 
q__autoScale = autoScaleSteric; 
15 if ( q_autoScale < 0 ) 

q_autoScale - 0; 
if ( q_autoScale && q_autoScale < 20 ) 

q_autoScale = 20; 
q_partialMatch = partialMatch; 
2(L q_termFIag = terminalFlag; 

'% q_do2piece = do2piece; 

^ q_do3piece = do3piece; 

Jil q_doSubset = doSubset; 

\Z q_fallback = fallbackFlag; 

23p q_filter = filterFlag; 

F q_debuglp = debugjp; 

S qjiebugLevel = debugLevel; 

7* qJievDiff = hevDiff; 

pi q_ReductionFactor = reductionFactor; 

3CP q_featureFactor = featureFactor; 

p <LattachPenFactor = attachmentFactor * attachmentFactor; /* square what is passed in */ 

ry q_stepSize = stepSize; 

U if(ct) 
35 { 

fprintf(stderr, "Initializing query. . . \n"); 
qs = FindBreakPoints(ct, minatoms, terminalFlag, TRUE ); 
i = minatoms; 
if ( terminalFlag = = 0 ) 
40 i»; 

if ( q_fallback > 1 ) 
{ 

while ( (!qs j ] qs->s2cnt < q_fallback ) && i > = 3) 
{ 

45 if ( qs ) 

freeSplit(qs); 
qs = FindBreakPoints(ct, i, 1, TRUE ); 
q_minatoms = i; 
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#ifdef TRIPOS_VERSION 

2piece:%d 3piece: %d\n", 
#endif 

} 


if(qs) 

fprintf(stderr," Minatoms: %d number of fragments: %d 
i, qs->numFrags, qs->s2cnt, qs->s3cnt); 

i-S 


} 

10 else 

{ 

if ( !qs |j qs->numFrags = = 0 ) 

fallbackFlag = 1; 
while ( (!qs 1 1 qs->numFrags = = 0) && i > = 3) 
15 { 

if(qs) 

freeSplit(qs); 
qs = FindBreakPoints(ct, i, 1, TRUE ); 

2Q, } 

" } 

£ #ifdef TRIPOS_VERSION 

S °i if ( q_minatoms ! = minatoms ) 

|= fprintf(stderr, "running the query with a minimum heavy atom count of %d vs 

2%; %d\n", q_minatoms, minatoms ); 

t #endif 

S if(qs) 

r { 

q qs->ct = ct; 

3<3h SearchForFeatures(qs); 

q BuildFrags(qs); 

m BuildTopomers(ct, qs, (Split *) 0); 

0 } 

M, fprintf(stderr, "query initializedAn"); 

35 qmode = 0; 

if ( qs && qs->numFrags > 0 ) 

{ 

/* 25 is just a guess as of right now, 1/19/OL Need to evaluate. 
Small structures are hitting too many compounds. So we 
40 need to make the steric and features more sensative 

large structures are not hitting enough structures so make 

less sensative. 

example values: 12 hev atoms 25.0 / 12.0 — = 2.1 
45 increases the steric contribution by a little bit more than twice as much. 

50 hev atoms 25.0 / 50.0 

= 0.5 would decrease the steric contribution by half. This may be too much 

75 hev atoms 25.0 / 75.0 
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= 0.33 would decrease the steric contribution by 1/3. again maybe too much. 

*/ 

if ( q_autoScale ) 

{ 

autoScaleFactor = (double) q_autoScale / (double) qs->numHev ; /* 
based upon average drug like structure containing 25 heavy atoms */ 

if ( autoScaleFactor < 1.0) 

autoScaleFactor = (2.0 + autoScaleFactor) / 3.0; 
fprintf(stderr,"Auto steric scaling factor : %8.21f\n", autoScaleFactor ); 

} 

else 

autoScaleFactor = L0; 
return 0; /* everything is just fine, found some fragments */ 

} 

if(qs) 
{ 

freeSplit(qs); 
qs = (Split *) 0; 

} 

} 

qmode = 0; 

return -2; /* failed */ 

} 


void TOP_QUERY_DUMP(FILE *fp, char *id_fieldname ) 
{ 

int i; 
Frag *f; 

if ( !fp 1 1 !id_fieldname | j !qs ) 
return; 

if (qs->ct) 

DB_CT_WRITE(fp, qs- > ct); 
for ( i = 0; i < qs->numFrags; i++ ) 
{ 

f = qs-> frags + i; 
if (f->ct) 

writeCopy(Q) > f->ct, i, -1, idfieldname ); 

} 

} 

top result *TOP_COMPARE_WDETAIL( struct CtConnectionTable *ct, double radius, int idx, int 
keepCts ) 

{ 

top_result *res; 
top result *rescopy; 
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if ( radius < = 
radius : 


= 0.0) 
= 99999.9; 


res = top_compare(ct, radius, 1, idx, keepCts ); 
5 if ( res && res-> comfadiff < - radius ) 

{ 

rescopy = (top_result *) malloc(sizeof(top_result) ); 
memcpy((char *) rescopy, (char *) res, sizeof(top_result) ); 
return rescopy; 

10 } 

else if ( res ) 

{ 

TOP_FREE_RESULT(res, 0 ); 

} 

15 return (top_result *) 0; 

} 

/* 

Compare the ct structure with 3D coordinates with 
2(L the ct specified to TOP_QUERY_OPTIONS, 

returns the topomeric difference or a negative value upon 
J: failure or being filtered out. 

returns the filtered status through the filtered pointer. 

2fc The input radius is passed in for filtering reasons 

% */ 

^ double TOP_COMPARE(struct CtConnectionTable *ct, double radius, int *filtered, int idx ) 

n ( • . 

30^ top_result *res; 

jSj double comfa_diff; 

J- UTLJERRQRCLEARO; 
lT *filtered = 0; 

3^ if ( radius < = 0.0 ) 

radius = 99999.9; 
res = top_compare(ct, radius, 0, idx, 0 ); 
if ( res ) 

{ 

40 comfa_diff = res->comfa_diff; 

TOP_FREE_RESULT(res,0); 
return comfa diff; 

} 

return -1.0; 

45 } 

static top_result *top_compare(struct CtConnectionTable *ct, double radius, int details, int idx, int 
keepCts ) 
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{ 

static topjresult ts[l]; 
int i; 
Split *s; 
double *cord; 

double comfa_diff > best2, best3; 

int qidx, sidx, splitidx, splitlnThree, subsetHit; 

int bailedout; 

int strmin; 

static int envjninSubsetSize = -1; 
#ifdefHEV_STATS 

static FILE *bfp; 
char *regid; 

#endif 


UTLERRORCLEARO; 

if (!DB_CT_GET_CT_ATTR( ct, CtCODCoordSet, &cord, &i)) 

return (top_result *) 0; 
DB_CT_UTL_FIND_RINGS(ct); 

if ( q_fallback > 1 ) 
{ 

i = strmin = ct->atomCount / 2; 
s = FindBreakPoints(ct, i, q_termFlag, TRUE ); 
if ( qjermFlag ) 
i-S 

whUe ( (!s ! | s->s2cnt < q_fallback ) && i ) 
{ 

if(s) 

freeSplit(s); 
strmin = i; 
i-S 

s = FindBreakPoints(ct, i, 1, TRUE ); 

} 

#ifO 

fprintf(stderr, "structure min atoms: %d\n", strmin ); 

#endif 

} 

else 

{ 

searchCnt+ + ; 

s = FindBreakPoints(ct ? q_minatoms, qLtermFlag, TRUE ); 
if ( !s ) 

return (top ^result *) 0; 
i = q_minatoms; 
if ( q_termFlag ) 

i-S 
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while ( s && s-> numFrags = = 0 && i && q_fallback ) 
{ 

freeSplit(s); 

s = FindBreakPoints(ct, i, 1, TRUE ); 
i-S 

} 

} 

if ( !s 1 1 !s->s2cnt) 
{ 

if (s) 

freeSplit(s); 
return (top result *) 0; 

} 

if ( envminSubsetSize == -1 ) 
{ 

char *tptr; 

tptr = getenv("DBTOP_MIN_HEV "); 
if(tptr) 

{ 

env_minSubsetSize = atoi(tptr); 
if ( envjninSubsetSize < 0 ) 

envjninSubsetSize = 0; 

} 

else 

envminSubsetSize = 0; 

} 

qjninSubsetSize = env_minSubsetSize; /* qs->numHev - # some number */ 

qJ>ailout - radius * radius; 

memset((char *) ts, '\0\ sizeof(top_result) ); 

s->ct = ct; 

SearchForFeatures(s); 

if ( q_featureFactor > 0.0 ) 

ts->comfa_diff = CompareAllFeatures(qs,s ? radius ); 
if ( ts-> comfa_diff < = radius ) 
{ 

if ( <LfeatureFactor > 0.0 ) 

BuildTopomers(ct, s, qs); 

else 

BuildTopomers(ct, s, (Split *) 0 ); 
ts->comfa_diff = CompareTwoCompounds(qs ? s ? radius, &qidx, &sidx ? &splitidx, 
&splitInThree, &subsetHit, 

&(ts->best2), &(ts->best3) ? &(ts- > bestSub), 

&(ts->attachmentPenalty), bailedout ); 
} 

else 
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{ 

t_featFiltered+ + ; 

qidx = -1; /* Indicate no indexing */ 

> 

ts->ct = ct; /* save a pointer to the ct being compared */ 
ts->idx = idx; 

#ifdefHEV_STATS 

regid = (char *) 0; 

DB_CT_GET_CT_ATTR(ct,CtCtRegId, &regid ); 
if ( Iregid ) 

DB_CT_GET_CT_ATTR(ct, CtCtName, &regid ); 
if ( !bfp ) 

bfp = fopen("hev.stats", "w"); 
rprintf(bfp,"%s %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d\n\ regid, s->numHev, 
qs->numHev, 

qs->numHev - s->numHev, 
abs(s- > numHev - qs->numHev), 
(int) ts->comfa_diff, (int) ts->best2, (int) ts->best3, 
s->numFrags, s->s2cnt, s->s3cnt); 
if ( !(idx % 100 ) ) 
fflush(brp); 

#endif 

if ( details && qidx > = 0 ) 
{ 

if ( get_details(ts, qs, s, qidx, sidx, splitidx, splitlnThree, subsetHit, keepCts ) ) 
{ 

ts->comfa_diff = q_b a il° ut ; 

fprintf(stderr," internal failure, please provide query, options, and structure 

belowAn"); 

if (s->ct) 

DB_CT_WRITE(stderr, s->ct); 

} 

} 

freeSplit(s); 
return ts; 

} 

static double CompareAllFeatures(Split *query, Split *str, double radius ) 
{ 

double best; 

static Split *qfeatlnit; 

static int qFeatures[5]; 

int sFeatures[5]; 
static int tsearched; 
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double best2, best3, bestsub; 

double dl,d2, d3, d4, d5, d6; 

double dval[6]; 

int hevCnts[6]; 

double attPen[2]; 

int bestQ, bestStr; 

int bestldx; 

int threelsBetter - 0; 

int SublsBetter = 0; 

int idl, id2, id3, id4; 

int i j,k, 1; 

int ids[3]; 

Frag *f, *sf; 

Frag *ql, *q2, *q3, *q4; 

Frag *fsl, *fs2, *fs3, *fs4; 

Frag *fragPtrs[3]; 

Frag *qActive; 

split2 *qs2, *ss2; 

split3 *qs3, *ss3; 

double *dptr; 

double hexdiff; 

int max3; 

static Split *qlnit; 

double bailout; 

static int t_quick; 

int combo2, combo3; 

int nskip2, nskip3; 


memset((char *) sFeatures, *\0\ sizeof(int) * 6 ); 
for ( i = 0; i < str->atomCount; i++ ) 
{ 

if ( str->featureMask ) 
{ 

if ( str->featureMask[i] & FeaturePos ) 

sFeatures[l] += 1; 
if ( str->featureMask[i] & FeatureNeg ) 

sFeatures[2] += 1; 
if ( str->featureMask[i] & FeatureHBA ) 

sFeatures [3] += 1; 
if ( str->featureMask[i] & FeatureHBD ) 

sFeatures [4] += 1; 

} 

} 

sFeatures[0] = str->numArom; 

if ( qfeatlnit ! = query ) 
{ 
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memset((char *) qFeatures, *\0\ sizeof(int) * 6 ); 
for ( i = 0; i < query- >atomCount; i++ ) 

{ 

if ( query- > featureMask ) 

{ 

if ( query- >featureMask[i] & FeaturePos ) 

qFeatures[l] += 1; 
if ( query- >featureMask[i] & FeatureNeg ) 

qFeatures[2] += 1; 
if ( query- >featureMask[i] & FeatureHBA ) 

qFeatures[3] + = I; 
if ( query- > featureMask[i] & FeatureHBD ) 

qFeatures[4] += 1; 

} 

} 

qFeatures[0] = query- >numArom; 
qfeatlnit = query; 

f^rintf(stderr," Query feature counts Arom: %d Pos & Neg: %d & %d HBA & 
HBD: %d& %d\n\ 

qFeatures[0], qFeatures[l], qFeatures[2], qFeatures[3] ? qFeatures[4] ); 

} 

#if 0 

fprintf(stderr, "structure feature counts Arom: %d Pos & Neg: %d & %d HBA & HBD: 
%d & %d \n", 

sFeatures[0], sFeaturesfl], sFeatures[2], sFeatures[3], sFeatures[4] ); 

#endif 

tsearched+ + ; 

if ( q_partial Match = = 0 ) 

{ 

for ( best = 0.0, i = 0; i < 5; i+ + ) 
{ 

#defme SAFE FEATURE QUICK 
#ifdef SAFEFEATUREQUICK 

if ( qFeatures[i] && !sFeatures[i] ) 

best + = featureWeights[i] * featureWeights[i] * (double) ( (qFeaturesfi] 

- sFeaturesfi]) ); 
#else 

if ( qFeatures[i] > sFeatures[i] ) 

best + = featureWeights[i] * featureWeights[i] * (double) ( (qFeatures[i] 

- sFeaturesfi]) ) * q_ReductionFactor; 
#endif 

} 

if ( best < 0.0 ) 

best = 0.0; 
best = sqrt(best); 
if ( best > radius ) 
{ 
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t_quick+ + ; 
return 9999.00; 

} 

} 

BuildFrags(str); /* Postpone building the frags after a quick feature filtering */ 
for ( i = 0, f = query- > frags; i < query- >numFrags; i+ + , f++ ) 

{ 

if ( qjpartialMatch ) 

{ 

if (f->feature2PDiff ) 

free((char *) f->feature2PDiff); 
if (f->feature3PDiff) 

free((char *) f->feature3PDiff); 
if ( f- > featureSubsetDiff ) 

free((char *) f-> featureSubsetDiff); 
f->feature2PDiff = (double *) calloc(str->numFrags,sizeof(double) ); 
f->feature3PDiff = (double *) calloc(str->numFrags,sizeof(double) ); 
f-> featureSubsetDiff = (double *) calloc(str->numFrags,sizeof(double) ); 
for(j = 0;j < str->numFrags; j + + ) 
{ 

f->feature2PDifflj] = -1.0; 
f->feature3PDiff[j] = -1.0; 
f->featureSubsetDiff[j] = -1.0; 

} 

f->featureDiff = f->feature2PDiff; 

} 

else 

{ 

if (f->featureDiff ) 

free((char *) f->featureDiff); 
f->featureDiff = (double *) calloc(str->numFrags,sizeof (double) ); 
for ( j = 0; j < str->numFrags; j + + ) 

{ 

f->featureDiff[j] = -1.0; 

} 

} 

} 

best = 9999.0 * 9999.0; 
bailout = radius * radius; 
best3 = best2 = bestsub = best; 

combo2 = combo3 = nskip2 = nskip3 = 0; 
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2 piece feature comparisons 
*/ 

if ( query- >s2 && str->s2 && q_do2piece ) 
5 { 

for ( i = 0, qs2 = query->s2; i < query->s2cnt ; i+ +, qs2+4- ) 
{ 

ql = query- > frags + qs2->fragl; 
q2 = query- > frags + qs2->frag2; 
10 #ifhdef NO_STRMAP 

if ( !qs2-> strMap j ] str->s2cnt > query- >alloc2Map ) 

{ 

if ( qs2- > strMap && query- > alloc2Map ) 
free(qs2- > strMap); 
15 if (str->s2cnt > 0) 

qs2-> strMap = (int *) calloc(str->s2cnt, sizeof(int) ); 

else 

qs2-> strMap = (int *) 0; 

} 

20 - #endif 

; p if ( qs2-> tragi == -1 1 1 qs2->frag2 == -1) 

(■j continue; 

m for (j = 0, ss2 = str->s2; j < str->s2cnt; j + + , ss2 + + ) 

m { 
25 Jf #ifndef NO_STRMAP 

42 qs2->strMap[j] = 0; 

=13 combo2++; 

s #endif 

13 if (ss2->fragl == -1 \ \ ss2->frag2 == -1) 

30 4= continue; 

fy fsl = str-> frags + ss2->fragl; 

□ ■ fs2 = str-> frags + ss2->frag2; 

H ! idl = fsl->id; 

35 id2 = fs2->id; 


if ( q_partialMatch ) 
{ 

40 PartialMatchFeatures(query, 2, ql, q2, (Frag *) 0, (Frag *) 0, str, fsl, 

fs2, (Frag *) 0, (Frag *) 0, q_partialMatch ); 

PartiaIMatchFeatures(query, 2, ql, q2, (Frag *) 0, (Frag *) 0, str, fs2, 
fsl, (Frag *) 0, (Frag *) 0, q_partial Match ); 
} 

45 else 

{ 

if ( ql->featureDiff[idl] == -1.0 ) 

ql->featureDiff[idl] = compareFeatures( query, ql, str, fsl, -1, 
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if ( ql->featureDiff[id2] = = -1.0 ) 

ql->featureDiff[id2] = compareFeatures( query, ql, str, fs2, -1, 

-i ); 

if ( q2->featureDiff[idl] = = -1.0 ) 

q2->featureDiff[idl] = compareFeatures( query, q2, str, fsl, -1, 

-1 ); 

if ( q2- > featureDiff[id2] ==-1.0) 

q2->featureDiff[id2] = compareFeatures( query, q2, str, fs2, -1, 

-i ); 

} 

dl = ql->featureDiff[idl] + q2->featureDiff[id2]; 

if ( dl < best ) 

{ 

bestQ = i; 
bestStr = j; 
best = best2 = dl; 
bestldx = 0; 

} 

d2 = ql->featureDiff[id2] + q2->featureDiff[idl]; 

if ( d2 < best ) 

{ 

bestQ = i; 
bestStr = j; 
best = best2 = d2; 
bestldx = 1; 

} 

#ifndef NOJSTRMAP 

if ( dl < = Oailout 1 1 d2 < q_ba ilout ) 
{ 

qs2->strMap[j] = 1; 
nskip2++; 

} 

#endif 

} 

} 

if ( str->s2cnt > query- >alloc2Map ) 

query- >alloc2Map = str->s2cnt; 

} 


3 piece feature comparisons 


for ( i = 0, qs3 = query- >s3; q_do3piece && qs3 && i < query- >s3cnt; i+ + , qs3+ + ) 
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{ 

ql = query- > frags + qs3->fragl; 
q2 = query- > frags + qs3->frag2; 
q3 = query- > frags + qs3->frag3; 
q4 = query- > frags + qs3->frag4; 
#ifhdef NO_STRMAP 

if ( !qs3->strMap 1 1 str->s3cnt > query- >alloc3Map ) 

{ 

if ( qs3-> strMap && query- > alloc3Map ) 

free((char *) qs3- > strMap); 
if (str->s3cnt > 0) 

qs3-> strMap = (int *) calloc(str->s3cnt, sizeof(int) ); 

else 

qs3-> strMap = (int*) 0; 

} 

if ( qs3->fragl == -1 ! j qs3->frag2 = = -1 1 1 qs3->frag3 = = -1 ) 
continue; 

#endif 

for (j = 0, ss3 = str->s3; ss3 &&j < str->s3cnt; j + + , ss3+ + ) 
{ 

#ifhdef NO_STRMAP 

qs3->strMap[j] = 0; 
combo3+ + ; 


#endif 


if ( ss3->fragl == -1 ! | ss3->frag2 == -1 | j ss3->frag3 = = 4 ) 

continue; 
fsl = str->frags + ss3->fragl; 
fs2 = str-> frags + ss3->frag2; 
fs3 = str-> frags + ss3->frag3; 
fs4 = str- > frags + ss3->frag4; 
idl = fsl->id; 
id2 = fs2->id 
id3 = fs3->id 
id4 = fs4->id 


qj>artialMatch ); 
q_partialMatch ); 


■i ); 


if ( q_partialMatch ) 

{ 

PartialMatchFeatures(query, 3, ql, q2, q3, q4, str, fsl, fs2, fs3, fs4, 


} 

else 


PartialMatchFeatures(query, 3, ql, q2, q3, q4, str, fs4, fs3, fs2, fsl, 


if ( ql->featureDiff[idl] = = -1.0 ) 

ql->featureDiff[idl] = compareFeatures( query, ql, str, fsl, -1, 
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-i ); 


-i ); 


-i ); 


-i ); 


-i ); 


-i ); 


-i ); 


if (ql->featureDiff[id4] == -1.0) 

ql->featureDiff[id4] = compareFeatures( query, ql,str,fs4,-l, 


if ( q4->featureDiff[idl] = = -1.0 ) 

q4-> featureDiffftdl] = compareFeatures( query, q4, str, fsl, -1, 


if ( q4->featureDiff[id4] = = -1.0 ) 

q4- > featureDiff[id4] = compareFeatures( query, q4, str, fs4, -1 , 


if ( q2- > featureDiff[id2] ==-1.0) 

q2->featureDiff[id2] = compareFeatures( query, q2, str, fs2, -1, 


if ( q2->featureDiff[id3] = = -1.0 ) 

q2->featureDiff[id3] = compareFeatures( query, q2, str, fs3, -1, 


if ( q3->featureDiff[id3] = = -1.0 ) 

q3-> featureDiff[id3] = compareFeatures( query, q3, str, fs3, -1, 


if ( q3->featureDiff[id2] = = -1.0 ) 

q3-> featureDiff[id2] = compareFeatures( query, q3, str, fs2, -1, 


} 


attPen[0] = attPen[l] = 0.0; 
dval[0] = 0.0; 
dval[l] = 0.0; 

if ( q_attachPenFactor > 0.0 ) 
{ 

attPen[0] = ( computeAttachmentPenalty( ql, fsl, q4, fs4 ) + 
computeAttachmentPenalty(q4, fs4, ql, fsl) ); 

attPen[l] = ( computeAttachmentPenalty( ql, fs4, q4, fsl ) + 
computeAttachmentPenalty(q4, fsl, ql, fs4) ); 

dval[0] + = attPen[0]; 
dval[l] += attPenfl]; 

} 

if ( q_featureFactor > 0.0 ) 
{ 

dval[0] += ( ql->featureDiff[idl] + q4- > featureDiff[id4] ) / 2.0 + 
q2->featureDiff[id2] + q3->featureDiff[id3]; 

dval[l] += (ql->featureDiff[id4] + q4->featureDiff[idl] ) / 2.0 + 
q2->featureDiff[id3] + q3->featureDiff[id2]; 
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} 

max3 = 2; 

for ( k = 0; k < max3; k+ + ) 
5 { 

if ( dval[k] < best ) 
{ 

best == best3 = dval[k]; 
bestQ = i; 

10 bestStr = j; 

bestldx = k; 
threelsBetter = 1; 

} 

else if ( dvalfk] < best3 ) 
15 best3 = dval[k]; 

#ifhdef NOSTRMAP 

if ( dval[k] < = Oaf 011 * qs3->strMap[j] = = 0 ) 
{ 

20r= qs3->strMap[j] = 1; 

.*5 nskip3 + + ; 

S } 

m #endif 

m ) 

25jz } 

4= ) 

go if ( str- > s3cnt > query- > alloc3Map ) 
T" query- >alloc3Map = str->s3cnt; 

30| 

5 /* 

M= subset feature comparisons 
35 

Compare the query 2 piece fragmentation with 3 piece structure fragmentation. Match A-B in query 
with A-B or B-C in structure, where 
B is the center piece of the structure. 

40 For comparing two piece with 3 piece. Frag 1 & 2 are a set, while fragment 3 and 4 are a set, in that 
the 

attacment bond that is broken defines the connection between fragl and frag2. Frag3 and frag4 are the 
second split. Fragl and frag4 are 

the center/core fragments. Aligned from the different starting attachment atom. 

45 

*/ 

if ( query- >s2 && str->s3 && q_doSubset ) 
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/* loop over query 2 piece fragments, and compare with structure 3 piece 

fragments, */ 

for ( i = 0, qs2 = query- >s2; i < query->s2cnt ; i+ + , qs2++ ) 
{ 

if (qs2->fragl = = -1 1 1 qs2->frag2 = = -1) 
continue; 

ql = query- > frags + qs2->fragl; 
q2 = query- > frags + qs2->frag2; 
#iftidef NOSTRMAP 

if ( !qs2->subsetMap 1 1 str->s3cnt > query- > allocSubsetMap ) 

{ 

if ( qs2->subsetMap && query- > allocSubsetMap ) 

free(qs2- > subsetMap); 
if (str->s3cnt > 0) 

qs2-> subsetMap = (int *) calloc(str->s3cnt, sizeof(int) ); 

else 

qs2-> subsetMap = (int *) 0; 

} 

#endif 

for (j = 0, ss3 = str->s3; ss3 && j < str->s3cnt; j + + , ss3++ ) 
{ 

if (ss3->fragl == -1 j j ss3->frag2 == -1 1 1 ss3->frag3 == -1 ) 
continue; 

#ifndef NO_STRMAP 

qs2->subsetMap[j] = 0; 

#endif 

fsl = str-> frags + ss3-> tragi; 
fs2 = str-> frags + ss3->frag2; 
fs3 = str-> frags + ss3->frag3; 
fs4 = str-> frags + ss3->frag4; 
idl = fsl- > id; 
id2 = fs2->id; 
id3 = fs3- > id; 
id4 = fs4->id; 


if ( q_partialMatch ) 
{ 

PartialMatchFeatures(query, 1, ql, q2, (Frag *) 0, (Frag *) 0, str, fsl, 

fs2, (Frag *) 0, (Frag *) 0, q_partial Match ); 

PartialMatchFeatures(query, 1, ql, q2, (Frag *) 0, (Frag *) 0, str, fs2, 
fsl, (Frag *) 0, (Frag *) 0, q_partialMatch ); 

PartialMatchFeatures(query, 1, ql, q2, (Frag *) 0, (Frag *) 0, str, fs3, 
fs4, (Frag *) 0, (Frag *) 0, q_partialMatch ); 

PartialMatchFeatures(query, 1, ql, q2, (Frag *) 0, (Frag *) 0, str, fs4, 
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25e 


fs3, (Frag *) 0, (Frag *) 0, yartialMatch ); 
} 


-i ); 

10 -l ); 


-i ); 


-i ); 
-i ); 


-i ); 
-i ); 


else 
{ 

if (ql->featureDiff[idl] == -1.0) 

ql-> featureDiff[idl] = compareFeatures( query, ql, str, fsl, -1, 

if ( ql->featureDifftid2] == -1.0 ) 

ql- > featureDiff[id2] = compareFeatures( query, ql , str, fs2, -1, 


if ( q2->featureDiff[idl] = = -1.0 ) 

q2->featureDiff[idl] = compareFeatures( query, q2, str, fsl, -1, 


-i ); 

15 if ( q2- > featureDiff[id2] ==-1.0) 


q2-> featureDiff[id2] = compareFeatures( query, q2, str, fs2, -1, 


if ( ql->featureDifftid3] = = -1.0 ) 
2Qr== ql->featureDiff[id3] = compareFeatures( query, ql, str, fs3,-l, 


if ( ql->featureDiff[id4] == -1.0 ) 

ql-> featureDiff[id4] = compareFeatures( query, ql, str, fs4, -1, 


if (q2->featureDiff[id3] == -1.0) 

q2->featureDiff[id3] = compareFeatures( query, q2, str, fs3, -1, 


if ( q2- > featureDiff[id4] ==-1.0) 
3Qj~ q2-> featureDiff[id4] = compareFeatures( query, q2, str, fs4, -1, 

} 


j=* if ( q_featureFactor > 0.0 ) 

35 { 

dval[0] = ql->featureDiff[idl] + q2->featureDiff[id2]; 

dval[l] = ql->featureDiff[id2] + q2->featureDiff[idl]; 

dval[2] = ql->featureDiff[id3] + q2->featureDiff[id4]; 

dval[3] = ql->featureDiff[id4] + q2->featureDiff[id3]; 

40 } 

else 

dval[0] = dval[l] = dval[2] = dval[3] = 0.0; 

hevCnts[0] = hevCnts[l] = fsl->hevCnt + fs2->hevCnt; 
45 hevCnts[2] = hevCnts[3] = fs3->hevCnt + fs4->hevCnt; 

max3 = 4; 
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for ( k = 0; k < max3; k+ + ) 
{ 

if ( hevCnts[k] > q_minSubsetSize ) 

{ 

if ( dval[k] < best ) 
{ 

best = bestsub = dval[k]; 
bestQ = i; 
bestStr = j; 
bestldx = k; 
SublsBetter = 1; 

} 

else if ( dval[k] < bestsub ) 

{ 

bestsub = dval[k]; 

} 

} 

if ( dval[k] < = q_b ailout && qs2->subsetMap[j] = = 0 ) 
{ 

qs2->subsetMap[j] = 1; 

} 

} 

} 

} 

if ( str->s3cnt > query- > allocSubsetMap ) 

query- > allocSubsetMap = str->s3cnt; 
} /* end of subset */ 


if ( best < 0.0 ) 

best = 0.0; 

#if0 

fprintf(stderr,"%d of %d 2p skipped %d of %d 3p skipped best: %8.41f \n", 

combo2 - nskip2, combo2, combo3 - nskip3, combo3, sqrt(best) ); 

#endif 

return sqrt(best); 

} 

void TOP_FREE_RESULT(top_result *res, int freeRef ) 
{ 

int i; 
if ( !res ) 

return; 

for (i = 0; i < 3; i++ ) 
{ 

if ( res->strFrags[i] ) 
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DB_CT_DELETE_CT(res-> strFrags[i] ); 

} 

if (freeRef ) 

free((char *) res ); 

} 


static char tempString[200]; 

struct topograph *TOP_INIT_GRAPH( struct topograph *g, struct CtConnectionTable *ct ) { 

/ _* 

= = = = = = = = = = = = = = = = = = = = = = = = = = = */ 

/* (re) initializes topomer graph info *g for structure *ct */ 

int b, nowats, nowbds, nowmax, ntoats, toats[20], ntoats2, na, nb, bet, inRing; 
struct top_graph *gnew; 
struct bond_top_rec *bptr; 

set_ptr end_atoms=NIL, nuls=NIL, cnats = NIL, nxcn = NIL, a2chk=NIL, 

TOP_CONN_ATOMS0; 

CtBondTypeDef bType; 

if(!DB_CT_GET_CT_ATTR(ct,CtCtAtomCount,&nowats) | j !DB_CT_GET_CT_ATTR( ct, 
CtCtBondCount, &nowbds ) ) 
goto error; 

/* be sure rings were perceived */ 

if (!DB_CT_UTL_FIND_RINGS( ct )) goto error; 

/* (re)allocate all memory required by this structure, excepting sets of to atts */ 
if(g){ 

/* free all dependent memory */ 

for (b = 0; b < g->nbonds; b++) if (g->bstuff[b] .detail) { 

if (!UTL_SET_DESTROY( g->bstuff[b] .detail- >to_atts ) ) goto error; 
if (!UTL_MEM_FREE( g->bstuff[b]. detail ) ) goto error; 
g->bstuff[b] .detail = (struct bond detail rec *) 0; 

} 

/* if this molecule is bigger, reallocate dependent data arrays */ 
if (nowats > g- > maxatoms) { 

nowmax = (nowats > g- > maxatoms * 2 ? nowats : g- > maxatoms * 2 ); 
if (!( g->bstart = (int *) DB CT UTL RE ALLOC ( 

( char * ) g->bstart, sizeof(int) * nowmax ) ) ) goto error; 
g- > maxatoms = nowmax; 

} 

/* note that bonds are 2x more because they are stored rooted from both ends */ 
if (2 * nowbds > g->maxbonds ) { 

nowmax = (2 * nowbds > g- > maxbonds ? 2 * nowbds : g- > maxbonds ); 
if (!( g->bstuff = (struct bond top rec *) DB CT UTL RE ALLOC ( 
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} 

} 

else { 


( char * ) g-> bstuff, sizeof(struct bondjoprec) * nowmax ) ) ) goto error; 
g- > maxbonds = nowmax; 

gnew = g; 


if (! (gnew = (struct top_graph *) UTL_MEM_ALLOC( sizeof( struct topograph ) ) )) 


goto error; 


if (! (gnew -> bstart = (int *) UTL_MEM_ALLOC( sizeof( int ) * 1000 ) )) goto error; 
10 gnew- > maxatoms = 1000; 

if (! (gnew -> bstuff = (struct bondjoprec *) 

UTL_MEM_ALLOC( sizeof(struct bond_top__rec) * 2000 ) )) goto error; 
gnew- > maxbonds = 2000; 

} 

15 gnew->natoms = nowats; 

gnew->nbonds = nowbds; 

if (!(a2chk = UTL_SET_CREATE( nowats + 1 ) )) goto error; 

if (!(nuls = UTL_SET_CREATE( nowats + 1 ) )) goto error; 
20h if (Kcnats = UTL_SET_CREATE( nowats + 1 ) )) goto error; 

S if (!(nxcn = UTL_SET_CREATE( nowats + 1 ) )) goto error; 

m if (!(end_atoms = UTL_SET_CREATE( nowats + 1 ) )) goto error; 

in /* fill in tree information */ 
25 f bptr = gnew- > bstuff; 

J bet = 0; 

m for (na = 1; na < = nowats; na-f + ) { 

™ if (! (DB_CT_GET_ANY_ATOM_ATTR( ct, na, CtAtomBondCount, &ntoats ) )) goto 

□ error; 
30p if (ntoats > 20) { 

p §>rintf( stderr ? "More than 20 bonds to atom %d.\n", na ); 

n I goto error; 

a } 

N= if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, na, CtAtomBondToAtoms, &toats ) )) goto 

35 error; 

gnew->bstart[na - 1] = bet; 

for (nb = 0; nb < ntoats; nb+ +, bct+ +, bptr+ + ) { 
bptr- > from = na; 
bptr- > to = toats[ nb ] ; 
40 /* is this a topomerically labile bond? */ 

if (!(b = DB_CT_UTL_GET_BONDID( ct, na, bptr- > to ) )) goto error; 
if (!DB_CT_GET_BOND_ATTR( ct, b, CtBondlsInRing, &inRing) 

| j !DB_CT_GET_BOND_ATTR( ct, b, CtBondType, &bType ) 
i | ! DB CT GET AN Y_ATOM_ATTR( ct, toats[ nb ], 
45 CtAtomBondCount, &ntoats2 ) ) goto error; 

if (linRing && bType = = CtBondTypeSingle && ntoats > 1 && ntoats2 > 

D{ 

/* 
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if (!(bptr- > to_atts = TOP_CONN_ATOMS( ct, bptr- > to, bptr- > from, 
nuls, cnats, nxcn, end_atoms ) )) goto error; 

*/ 

if (!(TOP_MARK_BEST( ct, bptr- > to, bptr- > from, TRUE, bptr, NIL, 

5 NIL, NIL, 

a2chk, nuls, cnats, nxcn, end atoms ) )) goto error; 

} 

else bptr- > detail = (struct bonddetailrec *) 0; 

} 

10 } 

if(end_atoms) UTL_SET_DESTROY(end_atoms); 

if(nuls) UTL_SET_DESTROY(nuls); 

if(nxcn) UTL_SET_DESTROY(nxcn); 

if(cnats) UTL_SET_DESTROY(cnats); 
15 /* if(a2chk) UTL_SET_DESTROY(a2chk); jilek (to do) was cnats */ 

return gnew ; 

error: 

return (struct top_graph *) 0; 

} 

20N 

;S set_ptr TOP_CONN_ATOMS( 


25|! /* returns the set of all atoms in *ct which are attached to atoml, 
except that any path ending in atom2 is truncated. 
The returned set is created here (to be freed by user when finished) 
For efficiency in reprocessing the same structure, 
four working sets are supplied by caller */ 

struct CtConnectionTable *ct, 
int atoml, 
int atom2, 

set_ptr nuls, set_ptr cnats, set_ptr nxcn, setjtr end atoms ) 

int natot, ntoats, toats[20], natt, nats, elem, nuats; 
setj)tr a2chk=NIL; 

if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &natot )) goto error; 
UTL_SET_CLEAR(end_atoms) ; 
UTL_SETJNSERT( end_atoms, atom2 ); 

if (!(a2chk = UTL_SET_CREATE( natot + 1 ) )) goto error; 
/* root at first set of attached atoms */ 
45 if (! (DB_CT_GET_AN Y_ATOM_ATTR( ct, atoml, CtAtomBondCount, &ntoats) )) goto error; 

if (ntoats > 20) goto toomanyattms; 

if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, atoml, CtAtomBondTo Atoms, &toats ) )) goto error; 
for (natt=0; natt < ntoats; natt-h +) UTL_SET_INSERT( a2chk, toatsf natt ] ); 


3GE 


35 { 


40 
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if (UTL_SET_EMPTY( a2chk )) return( FALSE ); 

UTL_SET_DIFF_INPLACE( a2chk, endatoms, a2chk ); 
nats = UTL_SET_CARDINALITY( a2chk ); 
5 UTL_SET_COPY_INPLACE( cnats, a2chk ); 

/* breadth first search */ 
while (TRUE) { 
UTL_SET_CLEAR( nxcn ); 
elem = -1; 

10 while ( (elem = UTL_SET_NEXT( cnats, elem)) > = 0 ) { 

UTL_SET_CLEAR( nuls ); 
if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, elem, CtAtomBondCount, &ntoats ) )) goto error; 
if (ntoats > 20) goto toomanyattms; 

if (!(DB_CT_GET_A>fY_ATOM_ATTR(ct, elem, CtAtomBondTo Atoms, &toats ) )) goto error; 
15 for (natt=0; natt< ntoats; natt+ +) UTL_SET_INSERT( nuls, toats[ natt ] ); 

UTL_SET_DELETE( nuls, atoml ); 
UTL_SET_DIFF_INPLACE( nuls, end_atoms, nuls ); 
UTL_SET_OR_INPLACE( nxcn, nuls, nxcn ); 
UTL_SET_DIFF_INPLACE( nxcn, a2chk, nxcn ); 

20n } 

1 UTL_SET_OR_INPLACE( a2chk, nxcn, a2chk ); 

m nuats = UTL_SET_CARDINALITY( a2chk ); 

fij if (nuats < = nats) break; 

in nats = nuats; 

25JS UTL_SET_COPY_INPLACE( cnats, nxcn ); 

i > 

yy return a2chk; 
error: 

O return (set_ptr) NIL; 
3ttp 

O toomanyattms: 

fy iprintf( stderr, "More than twenty atoms attached to some atom in this structure. \n M ); 

O goto error; 

} 

int TOP MARK_BEST( 

/ " _* 

40 /* adds information for prioritizing attachments to an atom */ 
struct CtConnectionTable *ct, 

int al, /* the root atom */ 

int a2, /* the base of the root - skip it */ 

int full_data, /* provide information relating to near symmetries? + attached 

45 sets */ 

struct bondjopjrec *bptr, /* output here if fiill_data=TRUE */ 

int *only_atoms> /* output here if full_data= FALSE */ 

double *coo_in, /* atomic coords (retrieved from ct if not provided */ 
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set_ptr attach3set, /* if provided, a super root atom(s) 

for entire group (highest priority path is shortest to here) */ 
set_ptr a2chk, set_ptr nuls, set_ptr cnats, set_ptr nxcn, set_ptr end_atoms ) 

{ 

# define MAX_NP 8 

struct pathrec { 
int root, nrings, chosen, nats, done, a3id; 
double mw; 
set_ptr path, nxtls; 

}; 

struct pathrec p[MAX_NP]; 

int retval, toroot, ntoats, toats[20], natt, a, np, growing, nats, natot, ncycles, pnow, ringclosed, 
debug =FALSE; 

int nuats, elem, new_rings, pdone, p2do, best, decision, naout, lastnats = 0, lastdecision, arec2, 

a4; 

double *coo, tl, t2, diff, potl, pot2, podiff, getjpathmwO; 

np = 0; 

if (!(coo = coo_in)) { 

if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &coo, &natot )) goto error; 
} else if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &natot )) goto error; 

if (full_data) if (!( bptr-> detail = (struct bond detail rec* ) 

UTL_MEM_CALLOC( sizeof( struct bond_detail_rec ), 1 ) )) goto error; 

toroot = attach3set 1 1 !a2; 
UTL_SET_CLEAR( end_atoras ); 
if (a2) UTL_SET_INSERT( end_atoms, a2 ); 
arec2 = a2; 

UTL_SET_CLEAR( a2chk ); 
if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, al, CtAtomBondCount, &ntoats) )) goto error; 
if (ntoats > 20) goto toomanyattms; 

if (! (DB_CT_GET_ANY_ATOM_ATTR( ct, al, CtAtomBondToAtoms, &toats ) )) goto error; 
for (natt=0; natt< ntoats; natt+ +) UTL_SET_INSERT( a2chk, toats[ natt ] ); 
if (a2) UTL_SET_DELETE( a2chk, a2 ); 

/* initialize path records */ 
a = -l; 
np = 0; 

while (np < MAX_NP && (a = UTL_SET_NEXT( a2chk, a)) > = 0 ) { 
if (!(p[np].path = UTL_SET_CREATE( natot + 1 ) )) goto error; 
if (!(p[np].nxtls = UTL_SET_CREATE( natot + 1 ) )) goto error; 
p[np].root = a; 

p[np]. nrings = p[np].done = p[np].a3id = 0; 
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UTL_SET_INSERT( p[np].path, a ); 
np+ + ; 

} 

5 /* grow the paths */ 

growing = TRUE; 

nats = 0; 

ncycles = 0; 

while (growing ) { 
10 nuats = 0; 

ringclosed = FALSE; 

for (pnow = 0; pnow < np; pnow+ + ) if (!p[pnow].done) { 
UTL_SET_COPY_INPLACE( cnats, p[pnow].path ); 
UTL_SET_CLEAR( nxcn ); 
15 elem = -1; 

/* accumnulate this generation of attached atoms into nxcn */ 

while ( (elem = UTL_SET_NEXT( cnats, elem)) > = 0 ) { 
UTL_SET_CLEAR( nuls ); 
if (! (DB_CT_GET_ANY_ATOM_ATTR( ct, elem, CtAtomBondCount, &ntoats) )) goto error; 
20fi if (ntoats > 20) goto toomanyattms; 

J if (! (DB_CT_GET_ANY_ATOM_ATTR( ct, elem, QAtomBondTo Atoms, &toats ) )) goto 

ffl error; 

m for (natt=0; natt<ntoats; natt+ +) UTL_SET_INSERT( nuls, toats[ natt ] ); 

in UTL_SET_DELETE( nuls, al ); 

25JE UTL_SET_DIFF_INPLACE( nuls, endatoms, nuls ); 


ffl UTL_SET_ORJNPLACE( nxcn, nuls, nxcn ); 

UTL_SET_DIFF_INPLACE( nxcn, p[pnow].path, nxcn ); 

a } 

30C UTLJSET COPY INPLACE( p[pnow].nxtls, nxcn ); 

o } 

/* mark if reached root */ 
O for (pnow = 0; pnow < np; pnow+ +) { 

H : /* remove duplicate atoms caused by new ring closure */ 
35 for (pdone = 0; pdone < np; pdone++ ) if (pdone != pnow) { 

UTL_SET_AND_INPLACE( p[pnow].path, p[pdone].nxtls, a2chk ); 
if ((new_rings = UTL_SET_CARDINALITY( a2chk ))) { 
/* we have ring closure(s) */ 

ringclosed = TRUE; 
40 UTL_SET_OR_INPLACE( end_atoms, a2chk, end_atoms ); 

UTL_SET_DIFF_INPLACE(p[pdone].nxtls, a2chk, p[pdone].nxtls ); 

} 

} 

/* stop growing a path that has reached anything in attach3set */ 
45 if (toroot) { 

elem = -1; 

while ((elem = UTL_SET_NEXT( attach3set, elem)) > = 0 ) { 
if (UTL_SET_MEMBER( p[pnow].path, elem ) ) { 
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p[pnow].done = TRUE; 
break; 

} 

} 

5 } 
} 

/* add all OK new atoms to all paths */ 

for (pnow = 0; pnow < np; pnow 4- +) { 
UTL_SET_OR_INPLACE( p[pnow].path> p[pnow].nxtls, p[pnow].path ); 
10 UTL_SET_CLEAR( p[pnow].nxtls ); 

} 

/* done growing paths if no more atoms added to any path . . */ 
for (pdone = 0, nuats = 0; pdone < np; pdone+ + ) 

nuats += UTL_SET_CARDINALITY( p[pdone].path ); 
15 if (nuats < =nats && Iringclosed) growing = FALSE; 

nats = nuats; 
/* or after 100 atom layers out regardless */ 
ncycles+ + ; 

if (ncycles > = 100) growing = FALSE; 

20p } 

/* debugging */ 

ffi if (debug) for (pdone = 0; pdone < np; pdone+ +) { 

ry sprintf( tempString, "Path %d (from %d): 

iff pdone + 1 , p[pdone] .root ); 

25Lp ^printf( stdout, tempString ); 

-C ashow( p[pdone].path ); 

09 } 

Q if(full_data){ 

3ap if (!( bptr-> detail- >to_atts = UTL_SET_CREATE( natot 4- 1 ) )) goto error; 

O UTL_SET_INSERT( bptr- > detail- > to_atts, al ); 

ry } 


35 


/* compute the path properties */ 
for (pdone = 0; pdone < np; pdone+ +) { 


p [pdone], chosen = toroot; 
if (toroot) { 

p[pdone]. chosen = FALSE; 
40 elem = -1; 

while ((elem = UTL_SET_NEXT( attach3set, elem)) > = 0 ) { 
if (UTL_SET_MEMBER( p[pdone].path ? elem ) ) { 
/* recording atom ID for later use */ 

p[pdone]. chosen = TRUE; 
45 p[pdone].a3id = elem; 

arec2 = p[pdone].root; 
break; 
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} 

} 

p[pdone].nats = UTL_SET_CARDINALITY( p[pdone].path ); 
p[pdone].nrings = p[pdone].nrings ? 1 : 0; 
p[pdone].mw = 0.0; 
p[pdone].done = 0; 

if (full_data) UTL_SET_OR_INPLACE( bptr-> detail- >to_atts, p[pdone].path, 
bptr- > detail- > to_atts ); 
} 

/* return all root atoms, ordered best to worst */ 
for (p2do = 0; p2do < np; p2do+ + ) { 
/* start with first unchosen atom */ 
for (pdone = 0; pdone < np; pdone++) if (!p[pdone].done) { 

best = pdone; 

break; 

} 

/* look for something better */ 
for (pdone = 0; pdone < np; pdone-h+) if (!p[pdone].done && pdone != best) { 
decision = FALSE; 

if (p[bestj. chosen 1 = p[pdone]. chosen) { 
decision = TRUE; 

if (!p[best]. chosen && p[pdone]. chosen) best = pdone; 

} 

if (! decision) { 
if (p[pdone].nats != p[best].nats ) { 
decision = TRUE; 

if (p [pdone]. nats > p[best].nats) best = pdone; 

} 

} 

if (Idecision) { 

p[pdone].mw = get_path_mw( p[pdone].path, ct, p[pdone].mw ); 
p[best].mw = get_path_mw( p[best].path ? ct, p[best].mw ); 
if (p[pdone].mw - p[best].mw > 0.01 * p[best].mw | | 
p[pdone].mw - p[best].mw < -0.01 * p[best].mw ) { 
decision = TRUE; 

if (p[pdone].mw - p[best].mw > 0.01 * p[best].mw) best = pdone; 

} 

} 

/* checking relative geometries of attachments via "improper" torsion */ 

/* the phenyl ether problem - if candidates are 180 degrees apart and we are on the 
root side of the torsion, pick the atom to the "right", not the "left", of the main chain */ 

if (Idecision && toroot && p[pdone].a3id ) { 
/* are we 180 apart? */ 

a4 = p[pdone].a3id; 

potl = UTL_GEOM_TAU( coo+(a4-l)*3, coo-h(al-l)*3, coo+(arec2-l)*3, 
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coo +(p[best]. root- 1)*3 ); 

pot2 = UTL_GEOM_TAU( coo+(a4-l)*3, coo+(al-l)*3, coo+(arec2-l)*3, 
coo +(p[pdone]. root- 1)*3 ); 

podiff = potl - pot2; 
5 while (podiff < 0.0) podiff + = 360.0; 

while (pot2 < 0.0) pot2 + = 360.0; 
if (podiff < 190.0 && podiff > 170.0 ) { 
decision = TRUE; 
if (pot2 < 180.0) best = pdone; 

10 } 
} 

if ('.decision) { 

/* if not already set, according to the previous special case, then */ 
/* if torsions differ by 360 degrees then we have trans, prefer the + 180 */ 
15 tl = UTL_GEOM_TAU ( coo+(p[pdone].root-l)*3, coo+(al-l)*3, coo+(arec2-l)*3, 

coo+(p[best].root-l)*3 ); 

t2 = UTL_GEOM_TAU ( coo + (p [best]. root- 1)*3, coo+(al-l)*3, coo+(arec2-l)*3, 
coo+(p[pdone].root-l)*3 ); 

diff = tl - 12; 

20-1 if (diff > 355.0) best = pdone; 

1 else if (diff > -355.0) { 

m while (tl < 0.0) tl += 360.0; 

m if (tl > 170.0 && tl < = 350.0) best = pdone; 

in } 

25LC } 

ffl /* output all information about this atom */ 

if (p2do < 3) { 

O if (full_data) { 

30p if (p2do) { 

Q bptr- > detail- >identical[ p2do - 1 ] = lastdecision ? 1 : 0 ; 

fy bptr- > detail- > natlvs2[ p2do - 1 ] = lastnats - p[best].nats; 

O bptr- > detail- > lastnat[ p2do - 1 ] = p[best].nats; 

p. } 

35 bptr- > detail- > best[ p2do ] = p[best] .root; 

} else only_atoms[ p2do ] = p[best].root; 

} 

lastnats = p[best].nats; 
40 lastdecision = decision; 

p[best].done = TRUE; 

} 

retval = TRUE; 
error: 

45 retval = TRUE; 

for (pnow = 0; pnow < np; pnow+ + ) { 

if (p[pnow].path) UTL_SET_DESTROY(p[pnow].path); 
if (pfpnowj.nxtls) UTL_SET_DESTROY(p[pnow].nxtls); 
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} 

return( retval ); 
toomanyattms: 

fprintf( stderr, "Too many attachments to an atom (>20)\n" ); 
goto error; 

} 

#ifO 

/ 


/* adds information for prioritizing attachments to an atom */ 
static int topMarkBest( 
Frag *fragP, 

struct CtConnectionTable *ct, 

int *atoms, /* sizeof ct-> atomCount, true false for each atom to use */ 

int al, /* the root atom */ 

int a2, /* the base of the root - skip it */ 

int full_data, /* provide information relating to near symmetries? + attached 


sets */ 
{ 

#ifO 


struct bond_top_rec *bptr, /* output here if full_data=TRUE */ 

int *only_atoms, /* output here if full_data= FALSE */ 

double *coo_in, /* atomic coords (retrieved from ct if not provided */ 

set_ptr attach3set, /* if provided, a super root atom(s) 

for entire group (highest priority path is shortest to here) */ 
set_ptr a2chk, set_ptr mils, setjrtr cnats, set_ptr nxcn, set_ptr end atoms ) 

#endif 

#define MAX_^NP 8 
struct pathrec { 
int root, nrings, chosen, nats, done, a3id; 
double mw; 
setjrtr path, nxtls; 

}; 

struct pathrec p[MAX_NP]; 

int retval, toroot, ntoats, toats[20], natt, a, np, growing, nats, natot, ncycles, pnow, ringclosed, 
debug=FALSE; 

int nuats, elem, new rings, pdone, p2do, best, decision, naout, lastnats = 0, lastdecision, arec2, 

a4; 

double *coo, tl, t2, diff, potl, pot2, podiff, get_path_mwO; 
set_ptr a2chk; 


np = 0; 
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if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &coo, &natot )) goto error; 
natot = ct->atomCount; 

#ifO 

5 toroot '= attach3set 1 1 !a2; 

UTL_SET_CLEAR( end_atoms ); 

if (a2) UTL_SET_INSERT( end_atoms, a2 ); 

arec2 = a2; 

#endif 

10 

a2chk = UTL_SET_CREATE(natot + 1); 

UTL_SET_CLEAR( a2chk ); 
if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, al, CtAtomBondCount, &ntoats) )) goto error; 
15 if (ntoats > 20) goto toomanyattms; 

if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, al, CtAtomBondTo Atoms, &toats ) )) goto error; 
for (natt=0; natt< ntoats; natt+ +) UTL_SET_INSERT( a2chk, toats[ natt ] ); 

#if 0 

if (a2) UTL_SET_DELETE( a2chk, a2 ); 

20C #endif 

do /* initialize path records */ 

ry a = -1; 

ill np = 0; 

2Sp while (np < MAX NP && (a = UTL_SET_NEXT( a2chk, a)) > = 0 ) { 
=P if (!(p[np].path = UTL_SET_CREATE( natot + 1 ) )) goto error; 

CB if (!(p[np].nxtls = UTL_SET_CREATE( natot + 1 ) )) goto error; 

=_ p[np].root = a; 

O p[np].nrings = p[np].done = p[np].a3id = 0; 

30f = UTL_SET_INSERT( p[np] .path, a ); 

Q np++; 

S } 

/* grow the paths */ 
35 growing = TRUE; 

nats = 0; 
ncycles = 0; 
while (growing ) { 
nuats = 0; 
40 ringclosed = FALSE; 

for (pnow = 0; pnow < np; pnow-f + ) if (!p[pnow].done) { 
UTL_SET_COPYJNPLACE( cnats, p[pnow].path ); 
UTL_SET_CLEAR( nxcn ); 
elem = -1; 

45 /*. accumnulate this generation of attached atoms into nxcn */ 

while ( (elem = UTL_SET_NEXT( cnats, elem)) > = 0 ) { 
UTL SET_CLEAR( nuls ); 
if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, elem, CtAtomBondCount, &ntoats) )) goto error; 
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if (ntoats > 20) goto toomanyattms; 

if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, elem, CtAtomBondTo Atoms, &toats ) )) goto 

error; 

for (natt=0; natt< ntoats; natt++) UTL_SET_INSERT( nuls, toats[ natt ] ); 
UTL_SET_DELETE( nuls, al ); 
UTL_SET_DIFF_INPLACE( nuls, endatoms, nuls ); 

UTL_SET_OR_INPLACE( nxcn, nuls, nxcn ); 
UTL_SET_DIFF_INPLACE( nxcn, p[pnow].path, nxcn ); 

} 

UTL_SET_COPY_INPLACE( p[pnow].nxtls, nxcn ); 

} 

/* mark if reached root */ 

for (pnow = 0; pnow < np; pnow+ +) { 
/* remove duplicate atoms caused by new ring closure */ 

for (pdone = 0; pdone < np; pdone+ + ) if (pdone ! = pnow) { 
UTL_SET_AND_INPLACE( p[pnow].path, p[pdone].nxtls, a2chk ); 
if ((new_rings = UTL_SET_CARDINALITY( a2chk ))) { 
/* we have ring closure(s) */ 

ringclosed = TRUE; 

UTL_SET_OR_INPLACE( end_atoms, a2chk, end_atoms ); 
UTL_SET_DIFF_INPLACE( p[pdone].nxtls, a2chk, p[pdone].nxtls ); 

} 

} 

/* stop growing a path that has reached anything in attach3set */ 
if (toroot) { 

elem = -1; 

while ((elem = UTL_SET_NEXT( attach3set, elem)) > = 0 ) { 
if (UTL_SET_MEMBER( p[pnow].path, elem ) ) { 
p[pnow].done = TRUE; 
break; 

} 

} 

} 

} 

/* add all OK new atoms to all paths */ 

for (pnow = 0; pnow < np; pnow+ +) { 
UTL SET OR_INPLACE( p[pnow].path, p[pnow].nxtls, p[pnow].path ); 
UTL_SET_CLEAR( p[pnow].nxtls ); 

} 

/* done growing paths if no more atoms added to any path */ 

for (pdone = 0, nuats = 0; pdone < np; pdone+ + ) 

nuats += UTL_SET_CARDINALITY( p[pdone].path ); 

if (nuats < =nats && ! ringclosed) growing = FALSE; 

nats = nuats; 
/* or after 100 atom layers out regardless */ 

ncycles+ + ; 

if (ncycles > = 100) growing = FALSE; 
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} 

/* debugging */ 
if (debug) for (pdone = 0; pdone < np; pdone + +) { 
sprintf( tempString, "Path %d (from %d): 
5 pdone + 1 , p[pdone] .root ); 

fprintf( stdout, tempString ); 
ashow( p[pdone].path ); 

} 

10 if (full_data) { 

if (!( bptr->detail->to_atts = UTL_SET_CREATE( natot + 1 ) )) goto error; 
UTL_SET_INSERT( bptr- > detail- > to atts, al ); 

} 

15 /* compute the path properties */ 

for (pdone = 0; pdone < np; pdone++) { 

p[pdone]. chosen = toroot; 
if (toroot) { 

2(H p [pdone]. chosen = FALSE; 

etem = -1; 

m while ((elem = UTL_SET_NEXT( attach3set, elem)) > = 0 ) { 

m if (UTL_SET_MEMBER( p[pdone].path ? elem ) ) { 

m /* recording atom ID for later use */ 
25j pfpdone]. chosen = TRUE; 

p[pdone].a3id = elem; 
m arec2 = p[pdone].root; 

break; 

3QE } 

O } 

flj ptpdone].nats = UTL_SET_CARDINALITY( pfpdone]. path ); 

O p[pdone].nrings = p[pdone].nrings ? 1 : 0; 

H p[pdone].mw = 0.0; 

35 p[pdone].done = 0; 

if (full_data) UTL_SET_OR_INPLACE( bptr- > detail- > to_atts, p[pdone].path ? 
bptr- > detail- > to_atts ); 

} 

40 /* return all root atoms, ordered best to worst */ 
for (p2do = 0; p2do < np; p2do++ ) { 
/* start with first unchosen atom */ 
for (pdone = 0; pdone < np; pdone 4- 4- ) if (!p[pdone].done) { 
best = pdone; 
45 break; 

} 

/* look for something better */ 
for (pdone = 0; pdone < np; pdone++) if (!p [pdone]. done && pdone != best) { 
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decision = FALSE; 

if (p[best], chosen != p[pdone], chosen) { 
decision = TRUE; 

if (!p [best]. chosen && p[pdone]. chosen) best = pdone; 

5 } 

if (Idecision) { 
if (p [pdone] .nats != p[best].nats ) { 
decision = TRUE; 

if (p[pdone].nats > p[best].nats) best = pdone; 

10 } 
} 

if (Idecision) { 

p[pdone].mw = getj?ath_mw( p[pdone].path, ct, p[pdone].mw ); 
p[best].mw = getj>ath_mw( p[best].path, ct, p[best].mw ); 
15 if (p[pdone].mw - p[best].mw > 0.01 * p[best].mw | j 

p[pdone].mw - p[best].mw < -0.01 * p[best].mw ) { 
decision = TRUE; 

if (p[pdone].mw -p[best].mw > 0.01 * p[best].mw) best = pdone; 

} 

20H } 

% /* checking relative geometries of attachments via "improper" torsion */ 

SI /* the phenyl ether problem - if candidates are 180 degrees apart and we are on the 
\m root side of the torsion, pick the atom to the "right", not the "left", of the main chain */ 

h if (Idecision && toroot && p[pdone].a3id ) { 

gg /* are we 180 apart? */ 
T a4 = p[pdone].a3id; 

□ potl = UTL_GEOMTAU( coo+(a4-l)*3, coo + (al-l)*3, coo+(arec2-l) :i: 3, 

3<& coo+(p[best].root-l)*3 ); 
S pot2 = UTL_GEOM_TAU( coo+(a4-l)*3 ? coo+(al-l)*3, coo+(arec2-l)*3 ? 

ry coo-h(p[pdone].root-l)*3 ); 
Q podiff = potl - pot2; 

|I while (podiff < 6.0) podiff + = 360.0; 

35 while (pot2 < 0.0) pot2 + = 360.0; 

if (podiff < 190.0 && podiff > 170.0 ) { 
decision = TRUE; 
if (pot2 < 180.0) best = pdone; 

} 

40 } 

if (Idecision) { 

/* if not already set, according to the previous special case, then */ 

/* if torsions differ by 360 degrees then we have trans, prefer the +180 */ 

tl = UTL GEOM TAU ( coo + (p[pdone].root-l)*3, coo + (al-l)*3, coo + (arec2-l)*3, 
45 coo+(p[best].root-l)*3 ); 

t2 = UTL GEOM TAU ( coo + (p[best].root-l)*3, coo + (al-l)*3, coo+(arec2-l)*3, 
coo+(p[pdone],root-l)*3 ); 

diff = tl - 12; 
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if (diff > 355.0) best = pdone; 
else if (diff > -355.0) { 

while (tl < 0.0) tl += 360.0; 

if (tl > 170.0 && tl < = 350.0) best = pdone; 

5 } 
} 

} 

/* output all information about this atom */ 
if(p2do<3){ 
10 if (fulljlata) { 

if(p2do){ 

bptr-> detail- > identical! p2do - 1 ] = lastdecision ? 1 : 0 ; 
bptr-> detail- > natlvs2[ p2do - 1 ] = lastnats - p[best].nats; 
bptr-> detail- > lastnat[ p2do - 1 ] = p[best].nats; 

15 } 

bptr-> detail- >best[ p2do ] = p[best].root; 
} else only_atoms[ p2do ] = p[best].root; 

} 

2(H lastnats = p[best].nats; 
K lastdecision — decision; 
m p[best],done = TRUE; 

lj } 

Vfi retval = TRUE; 

25V error: 
J retval = TRUE; 

g] for (pnpw = 0; pnow < np; pnow+ + ) { 

if (p[pnow].path) UTLSETDESTROY(p[pnow].path); 
O if (p[pnow].nxtls) UTL_SETJDESTROY(p[pnow].nxtls); 

3<m } 

O return( retval ); 
fy toomanyattms: 

Q fprintf( stderr, "Too many attachments to an atom (>20)\n" ); 

h& goto error; 
35 } 

#endif 

static double get_path_mw( set_ptr aset, struct CtConnectionTable *ct ? double mw ) 
/* returns the total atomic weight of all atoms in aset */ 
40 { 

int elem = -1; 
double aw, ans = 0.0; 

if (mw) return( mw ); 
45 elem = -1; 

while ( (elem = UTL_SET_NEXT( aset ? elem)) > = 0 ) { 

if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, elem, CtAtomAtomicWeight, &aw ) )) return( 0.0 

); 
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ans + = aw; 

} 

return( ans ); 

} : 

5 

static void ashow( set_ptr aset ) 

/* for interactive debugging, shows a set's membership in terms of atom ID */ 

{ 

char buff[1000], *b; 
10 int elem; 

*buff = '\0'; 
b = buff; 
elem = -1; 

15 while ( (elem = UTL_SET_NEXT( aset, elem)) > = 0 ) { 

sprintf(b, " %d", elem); 
b = buff + strlen(buff); 

} 

sprintf(b, "\n" ); 
20*= fprintf( stdout, buff ); 

| } ' 

m I* CoMFA region descriptor - here it's a hidden data type */ 

251= double *TOP_STER_EVAL_RB_ATTEN( 

1 / 


-_- = = = = = = = = = = = = = = = = = = = = = = = = = = = */ 

fi /* computes and returns a CoMFA steric field, to be freed by caller when done */ 
30j 

q struct CtConnectionTable *ct, 

pj IRegionPtr regp, 

□ int root, /* atom ID of fragment root */ 

Li double *acoord, /* atomic coordinate array. If NIL, coordinates are retrieved from ct 

35 */ 

set_ptr a2use, /* optionally, if not NIL, field results only from this set of atoms */ 
double *ext_vdw_wt ) /* optionally, if not NIL, these are additional user-supplied wts for field 
calculation */ 

40 { 

int natot, nat, ix, iy, iz; 

double *steric=NIL, *AtWts=NIL, *TOP_FIELD_RB_WTS0, *ftemp, *coord, *vAwt=NIL, 
*vBwt=NIL, *va, *vb, *st; 

double radnow, epsnow, diff, dis2, dis6, disl2, x, y, z, atm_steric, sum steric, 
45 TOP_GET_ATOM_VDW_RADIUS0; 

#define MIN_SQ_DISTANCE 1.0e-4 
#define RADIUS_C3 1.7 
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#define EPSIL0NC3 .107 
#defme STERIC_MAX 30.0 


/* get coordinates, # atoms, RB attenuation for each atom */ 
if ((ftemp = acoord )) { 

if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &natot )) goto error; 
} else if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &ftemp, &natot )) goto error; 
if (!(AtWts = TOP FBELD RB WTS ( ct, root, a2use ) )) goto cleanup; 

/* compute VDW terms for each atom (not for each atom type as in SYBYL) */ 

if (!(vAwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 
if (!(vBwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 
if (regp->box_array[0].atom_type != 1 j | regp- > n_boxes != 1) 

fprintf( stderr, "WARNING: The C.3 probe atom type in a single box is alway used in 
the steric field calculation.Vn" ); 

for (nat= 1; nat < = natot; nat-f +) if (!a2use | j UTL_SET_MEMBER( a2use, nat )) { 

radnow = TOP_GET_ATOM_VDW_RADIUS( ct, nat, &epsnow ); 

radfiow + = RADIUS_C3; 

epsnow = sqrt( epsnow * EPSILONC3 ); 

vAwt[ nat-1 ] - epsnow * 2.0 * pow( radnow, 6.0 ) * AtWts[ nat-1 ]; 
vBwt[ nat-1 ] = epsnow * pow( radnow, 12.0 ) * AtWts[ nat-1 ]; 
if (ext_vdw_wt) { 

vAwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 

vBwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 

} 

} 

/* empty output array */ 

if (! (steric = (double *) UTL_MEM_CALLOC( regp->nj>oints, sizeof( double )) )) goto 
cleanup; 

st - steric; 

/* cycling over output array elements */ 

for (iz=0, z=regp->box_array[0].lo[2]; iz < regp->box_array[0].nstep[2]; iz+ + , z + = 
regp- > box_array[0] ,stepsize[2]) 

for (iy=0, y=regp->box_array[0].lo[l]; iy < regp->box_array[0].nstep[l]; iy+ + , y + = 
regp- > box_array[0].stepsize[l]) 

for (ix=0, x=regp->box_array[0].lo[0]; ix < regp->box_array[0].nstep[0]; ix+ +, x + = 
regp- > box_array[0] .stepsize[0]) 

{ 

/* cycling over ligand atoms */ 

for (nat = 0, coord = ftemp, sum_steric = 0, va = vAwt, vb = vBwt; nat < natot; nat + + , 
va+ + , vb++) 

if (!a2use 1 1 UTL_SET_MEMBER( a2use, nat )) { 
dis2 = x - *coord+ + ; dis2 *= dis2; 
diff = y - *coord+ + ; diff *= diff; dis2 + = diff; 
diff = z - *coord+ + ; diff *= diff; dis2 + = diff; 

if ( dis2 < MIN_SQ_DISTANCE ) atm_steric = STERIC_MAX * AtWts[ nat ]; 
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else { 

dis6 = dis2 * dis2 * dis2; 
disl2= dis6 * dis6 ; 
atmsteric = (*vb)/disl2 - (*va)/dis6; 
5 atm_steric = atm_steric > ( STERIC.MAX * AtWts[ nat ] ) ? STERIC_MAX * 

AtWts[ nat ] : atm_steric; 
} 

- sum_steric + = atm_steric; 

} 

10 else coord + = 3; 

*st = sumsteric > STERIC_MAX ? STERIC_MAX : sumsteric; 
st++; 

} 

15 cleanup: 

if (AtWts) UTL_MEM_FREE( AtWts ); 
if (vAwt) UTL_MEM_FREE( vAwt ); 
if (vBwt) UTL_MEM_FREE( vBwt ); 

error: 

20u== return( steric ); 

3 * 

\n static l_RegionPtr getRegionToUse(double *coords, int natoms, int *r_idx, int *r_npoints ) 

25| { " 

jr IComfaRegion *r; 

gt static double minx, maxx, miny, maxy, minz, maxz; 
int i; 

p double x,y,z; 

3Qj!: double cminx, cminy, cminz, cmaxx, cmaxy, cmaxz; 

p double edgeFact = 0.05; 


35 


cminx = cminy = cminz = 99999.0; 
cmaxx = cmaxy = cmaxz = -99999.0; 


for ( i = 0; i < natoms; i+ + ) 
{ 

x = *coords; 
y = *(coords + l); 
40 z — *(coords+2); 

if ( x < cminx ) 

cminx - x; 
if ( x > cmaxx ) 
45 cmaxx = x; 

if ( y < cminy ) 

cminy = y; 
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if ( y > cmaxy ) 

cmaxy = y; 

if ( z < cminz ) 
5 cminz - z; 

if ( z > cmaxz ) 

cmaxz = z; 

coords +=3; 

10 } 

for ( i = minRegion; i < maxregions; i++ ) 
{ 

r = regionsp]; 

15 

minx = r->box_array[0].lo[0] + edgeFact; 
miny - r->box_array[0].lo[l] + edgeFact; 
minz = r->box_array[0].lo[2] + edgeFact; 

20?*| maxx = minx + ( (double) r->box_array[0].nstep[0] -1.0 ) 

r->box_array[0].stepsize[0] - (edgeFact*2.0); 


m maxy = miny + ( 

fU r->box_array[0].stepsize[l] - (edgeFact*2.0) 

ip maxz = minz + ( 

25V r->box_array[0].stepsize[2] - (edgeFact*2.0) 


(double) r->box_array[0].nstep[l] -1.0 ) * 
(double) r->box_array[0].nstep[2] -1.0 ) * 


m #if 0 

T if ( r->box_array[0] Jo[0] = = 0.0 ) 

O . minx = -0.1; 

3Ctp #endif 

fij if ( cminx > = minx && cmaxx < = maxx && cminy > = miny && cmaxy < = maxy 

□ && cminz > = minz && cmaxz < = maxz ) 
M { 
35 *r_idx = i; 

*r_npoints = r->n_points; 

regionUseCnts[i] + = 1; 

return r; 

} 

40 } 

i = max regions - 1; 
*r_idx — i; 

regionUseCnts[i] += 1; 
45 r = regions[i]; 

*r_npoints = r->n_points; 

return r; 
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i 

static int getCordExtents(double *coords, int natoms, double *r_minx, double *r_miny, double *r_minz, 
double *r_maxx, double *r_maxy, double *r_maxz ) 
5 { 

double minx, maxx, miny, maxy, minz, maxz; 
double x,y,z; 
int i; 

10 minx = maxx = *coords; 

ihiny = maxy = *(coords+l); 
minz = maxz = *(coords+2); 
coords += 3; 

15 for ( i = 1; i < natoms; i+ + ) 

{ 

x = *coords; 
y = *(coords+l); 
z = *(coords+2); 
20^ coords + = 3; 

5 if ( x < minx ) 
m minx = x; 

j n else if ( x > maxx ) 
25jz maxx = x; 

m if ( y < miny ) 
7 miny = y; 

q else if ( y > maxy ) 
3CLp maxy = y; 

f|j if ( z < minz ) 
O minz = z; 

hk else if ( z > maxz ) 
35 maxz = z; 


40 


45 


*r_mmx 


minx; 

*r_maxx 


maxx; 

*r_miny 


miny; 

*r_maxy 


maxy; 

*r_minz 


minz; 

*r_maxz 


maxz; 

return 0; 
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static int atomsOutside(double *coords, int natoms, IRegionPtr regp, double *atwts, double *r_outpen 

) 
{ 

static IRegionPtr lastreg; 

static double minx, maxx, miny, maxy, minz, maxz; 
int i; 

int outside; 
double x,y,z; 
double dist; 

double edgeFact = 0.0; 

double incrfact; 

double outsidePen - 0.0; 

if (regp != lastreg ) 
{ 

minx = regp->box_array[0].lo[0] + edgeFact; 
miny = regp~>box_array[0].lo[l] + edgeFact; 
minz = regp->box_array[0].lo[2] 4- edgeFact; 

maxx - minx + (double) ( regp->box_array[0].nstep[0] -1 ) * 
regp->box_array[0].stepsize[0] - (edgeFact*2.0); 

maxy = miny + (double) ( regp->box_array[0].nstep[l] -1 ) * 
regp->box_array[0].stepsize[l] - (edgeFact*2.0); 

maxz = minz 4- (double) ( regp->box_array[0].nstep[2] -1 ) * 
regp->box_array[0].stepsize[2] - (edgeFact*2.0); 

/* When calculating atoms outside the region, count the atoms close to the edge 
as well. 

*/ 


#if 0 


#endif 


lastreg = regp; 

fprintf(stderr, n %6.21f %6.21f %6.21f %6.21f %6.21f %6.21f \n", 
minx, maxx, miny, maxy, minz, maxz ); 


} 


outsidePen = 0.0; 

for ( i = outside = 0; i < natoms; i+ + ) 

{ 

x = *coords; 
y = *(coords+l); 
z = *(coords+2); 

if ( x < minx | j x > maxx | j y < miny | | y > maxy j j z < minz | j z > maxz ) 

{ 

outside+ + ; 
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/* calculate a crude distance anyway */ 

dist = 0.0; 
if ( x < minx ) 

dist + = x*x - minx*minx; 
5 else if ( x > maxx ) 

dist 4- = x*x - maxx*maxx; 

if ( y < miny ) 

dist 4- = y*y - miny*miny; 
10 else if ( y > maxy ) 

dist + = y*y - maxy*maxy; 

if ( z < minz ) 

dist + = z*z - minz*minz; 
15 else if ( z > maxz ) 

dist 4- = z*z - maxz*maxz; 

dist = fabs(dist); /* just in case */ 

2tt- if (dist >= 1.0) 

J incrfact = STERIC_MAX * atwts[i]; 

m else 

j?j incrfact = STERIC_MAX * atwts[i] * dist; 

outsidePen += incrfact*incrfact; 

25e #if0 

^ fprintf(stderr /'outside %d atom:%d %6.21f %6.21f %6.21f points: %d %6.21f 

5 %6.21f %6.21f %6.21f %6.21f %6.21f \n", 

l~ outside, i, x, y, z, regp->n joints, minx, miny, minz, maxx, maxy, 

fi maxz ); 

3(£ #endif 

a } 

fy . coords +=3; 

M= *r_outpen = outsidePen; 

35 #if0 

fprintf(stderr/'i_extent: x %d %d y %d %d z %d %d\n", 

(int) cminx, (int) cmaxx, (int) cminy, (int) cmaxy, (int) cminz, (int) cmaxz ); 

fprintf(stderr/'extent: x %6.11f %6.11f %6.11f %6.11f %6.11f %6.11f \n", 
cminx, cmaxx, cminy, cmaxy, cminz, cmaxz ); 

40 #endif 

if ( outside ) 

t_outside+ + ; /* t_outside count's how many compounds have at least one atom outside 

the field */ 

t_fields+ + ; 
45 return outside; 

} 

double *TOP_STER_EVAL_ALL_RB_ATTEN( 
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/* computes and returns a CoMFA steric field, to be freed by caller when done */ 

struct CtConnectionTable *ct, 
IRegionPtr regp, 

int root, /* atom ID of fragment root */ 

double *acoord, /* atomic coordinate array. If NIL, coordinates are retrieved from ct 

*/ 

double *AtWts ) /* optionally, if not NIL, these are additional user-supplied wts for field 
calculation */ 
{ 

#ifhdef NO_COMPRESSION 
static int maxjlloc; 
static double *st_steric; 

#endif 

int natot, nat, ix, iy, iz; 

double *steric=NIL, *TOPJFIELD_RB_WTS0, *ftemp, *coord, *vAwt=NIL, *vBwt=NIL, 
*va, *vb, *st; 

double radnow, epsnow, diff, dis2, dis6, disl2, x, y, z, atm steric, sum steric, 

TOP_GET - _ATOM_VDW_RADIUS0; 
double xd, yd, zd; 
double maxw; 
double stepz, stepy, stepx; 
int nstepz, nstepy, nstepx; 
double lowz, lowy, lowx; 

#ifO 

int startEmpty, endEmpty; 

#endif 

int npoints; 

int freeWeights = 0; 

int outsideCnt = 0; 

#if0 

static double mindis = 99999.0; 
static double maxdis = -99999.0; 
static double maxdists[50]; 
static int distldx = -1; 
double abs_steric; 

#endif 

#define MIN SQ DISTANCE L0e-4 
#define RADIUS _C3 L7 
#define EPSILON_C3 .107 
#define STERIC_MAX 30.0 

#if0 

if (distldx == -1 ) 
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} 

#endif 


for ( nat = 0; nat < 50; nat+ + ) 

maxdists[nat] = STERIC_MAX * -1.0; 
distldx = 0; 


/* get coordinates, # atoms, KB attenuation for each atom */ 
10 if ((ftemp = acoord )) { 

if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &natot )) goto error; 
} else if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &ftemp, &natot )) goto error; 

#if0 

15 AtWts = computeVdwWeights(ct, root -1,-1, q_ReductionFactor, (int **) 0 ); 

#endif 

if ( !AtWts ) 
{ 

AtWts = (double *) malloc( natot * sizeof (double) ); 
20**== for ( nat = 0; nat < natot; nat+ 4- ) 

J AtWts[nat] = 1.0; 

?5 fteeWeights = 1; 

SI } 

m #if 0 

25j= if (!(AtWts = TOP_FIELD_RB_WTS( ct, root, (set_ptr) 0 ) )) goto cleanup; 

_g for ( nat = 0; q_debugfp && ext_vdw_wt && nat < ct->atomCount; nat+ + ) 

T ^ fprintf(q_debugfp ,"# weights %d %8.31f %8.31f\n", nat+1, AtWts[nat], 

O ext_vdw_wt[nat] ); 
30£ } 

6 #endif 


M= /* compute VDW terms for each atom (not for each atom type as in SYBYL) */ 
35 if (!(vAwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 

if (!(vBwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 
if (regp->box_array[0].atom_type != 1 1 1 regp- > n_boxes != 1) 

fprintf( stderr, "WARNING: The C.3 probe atom type in a single box is alway used in 
the steric field calculation.\n" ); 
40 for (nat= 1; nat < = natot; nat+ +) 

{ 

radnow = TOP_GET_ATOM_VDW_RADIUS( ct, nat, &epsnow ); 
radnow + = RADIUSC3; 
epsnow = sqrt( epsnow * EPSILON_C3 ); 
45 vAwt[ nat-1 ] = epsnow * 2.0 * pow( radnow, 6.0 ) * AtWts[ nat-1 ]; 

vBwt[ nat-1 ] = epsnow * pow( radnow, 12.0 ) * AtWts[ nat-1 ]; 

#if 0 

if (ext vdw wt) { 
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lendif 

} 


vAwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 
vBwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 

} 


/* empty output array */ 

/* Don't initialize with calloc, we set each field, waist of time, it really 
A 38% speedup was performed by calling malloc vs calloc 

*/ 


nstepz = regp->box_array[0].nstep[2]; 
nstepy = regp->box_array[0].nstep[l]; 
nstepx = regp->box_array[0].nstep[0]; 

stepz = regp->box_array[0].stepsize[2]; 
stepy = regp->box_array[0].stepsize[l]; 
stepx = regp->box_array[0].stepsize[0]; 

npoints = nstepz * nstepy * nstepx; 

lowz = regp->box_array[0].lo[2]; 
lowy = regp->box_array[0].lo[l]; 
lowx = regp->box_array[0].lo[0]; 

#ifhdef NOCOMPRESSION 

if ( npoints > max^alloc ) 

{ 

if ( Imax alloc ) 

max_alloc = 4000; 
while ( npoints > max alloc ) 

max_alloc *= 2; 

if ( st_steric ) 

free((char *) st_steric ); 
st_steric = (double *) malloc(sizeof (double) * max_alloc ); 

} 

steric = ststeric;; 

#else 

steric = (double *) malloc( npoints * sizeof( double ) ); 

#endif 


st = steric; 


/* cycling over output array elements */ 

for (iz=0, z=lowz; iz < nstepz; izH- +, z + = stepz ) 
for (iy=0, y=lowy; iy < nstepy; iy+ + , y += stepy ) 
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for (ix=0, x=lowx; ix < nstepx; ix+ + , x += stepx ) 

{ 

/* cycling over ligand atoms */ 

for ( nat = 0, coord - ftemp, sum_steric = 0.0, va = vAwt, vb = vBwt; 
nat < natot && sum_steric < STERIC_MAX; 
nat+ + , va+ + , vb++) 

{ 

#if0 

dis2 = x - *coord+ + ; dis2 *= dis2; 
diff = y - *coord+ + ; diff *= diff; dis2 + = diff"; 
diff = z - *coord+ + ; diff *= diff; dis2 + = diff; 

#endif 

xd = x - *coord+ + ; 
yd = y - *coord++; 
zd = z - *coord++; 
dis2 = xd*xd + yd*yd 4- zd*zd; 

#if 0 

if ( dis2 > 49.0 ) 
continue; 

#endif 

if ( dis2 > = MIN S Q DISTANCE ) 
{ 

dis6 = dis2 * dis2 * dis2; 

disl2= dis6 * dis6 ; 

atmsteric = (*vb)/disl2 - (*va)/dis6; 

#if0 

abs_steric = fabs(atm_steric); 

if ( AtWts[nat] ==1.0 && dis2 > 0.0) 

{ 

if ( dis2 < mindis && abs_steric < 0.001 ) 
{ 

. fprintf(stderr,"%10.81f dis:%7.31f\n", atm steric, dis2 

); 

mindis = dis2; 

} 

distldx = (int) dis2; 

if ( distldx < 49 && abssteric > maxdists[distldx] ) 
{ 

fprintf(stderr,"idx %d: %10.81f dis:%10.51f abs:%8.41f 

max\n", distldx, atm steric, dis2, abs_steric); 

maxdists[distldx] = abs_steric; 

} 

} 

#endif 

maxw = STERIC_MAX * AtWts[ nat ]; 
if ( atm steric > maxw ) 
atmsteric = maxw; 
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} 

else 

{ 

atm_steric = STERIC_MAX * AtWts[ nat ]; 

} 

sumjsteric -f = atmsteric; 

} 

*st = sum_steric > STERIC_MAX ? STERICMAX : sum_steric; 
st++; 

} 

#ifO 

for ( st = steric, iz = startEmpty = 0; iz < npoints && *st < 0.01 ; iz+ + , st+ + ) 
{ 

startEmpty + + ; 

} 

for ( st = steric + (npoints -1), iz = npoints, endEmpty = 0; iz && *st < 0.01; iz-, st-- ) 
{ 

endEmpty+ + ; 

} 

fprintf(stderr, " %d %d of <%d %6.21f \n", 

startEmpty, endEmpty, npoints, ((double) (startEmpty + endEmpty)* 100.0)/(double) 

npoints ); 

#endif 

cleanup: 

if (AtWts && freeWeights) free ( (char*) AtWts ); 
if (vAwt) UTL_MEM_FREE( vAwt ); 
if (vBwt) UTL_MEM_FREE( vBwt ); 

error: 

return( steric ); 

} 

double *TOP_STER_ATOM_EVAL_ALL_RB_ATTEN( 


=========================== */ 

/* computes and returns a CoMFA steric field, to be freed by caller when done, 
this version only computes the fields around each atom, outer loop is the ct's atoms */ 

struct CtConnectionTable *ct, 
I RegionPtr regp, 

int root, /* atom ID of fragment root */ 

double *acoord, /* atomic coordinate array. If NIL, coordinates are retrieved from ct 

*/ 

double * AtWts ) /* optionally, if not NIL, these are additional user-supplied wts for field 
calculation */ 

{ 

#ifndef NO_COMPRESSION 
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static int max_alloc; 
static double *st_steric; 

#endif 

int natot, nat, ix, iy, iz; 

double *steric=NIL, *TOP_FIELD_RB_WTS0, *ftemp, *coord, *vAwt=NIL, *vBwt=NIL, 

*st; 

double va, vb; 

double radnow, epsnow, diff, dis2, dis6, disl2, x, y, z, atmsteric, sumsteric, 
TOP_GET_ATOM_VDW_RADIUS0; 
double xd, yd, zd; 
double maxw; 
double stepz, stepy, stepx; 
int nstepz, nstepy, nstepx; 
double lowz, lowy, lowx; 
double currjowz, currjowy, currjowx; 
int curr_nstepsz, currjistepsy, curr_nstepsx; 
int currix, curriy, curriz; 
double currx, curry, curr_z; 

int maxjsteps; /* assumes stepz, stepy, and stepx are the same step size */ 
int max_xSteps, max_y Steps, max_zSteps; 

#if 0 

int startEmpty, endEmpty; 

#endif 

int npoints; 

int freeWeights = 0; 

int outsideCnt = 0; 

#if 0 

static double mindis = 99999.0; 
static double maxdis = -99999.0; 
static double maxdists[50]; 
static int distldx = -1; 
double abs_steric; 

#endif 

#define MIN SQ DISTANCE 1.0e-4 
#defme RADIUS_C3 1.7 
#defme EPSILON_C3 .107 
#define STERIC__MAX 30.0 

#if 0 

if (distldx == -1 ) 
{ 

for ( nat = 0; nat < 50; nat+ + ) 

maxdists[nat] = STERIC_MAX * -1.0; 
distldx = 0; 

} 

#endif 
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/* get coordinates, # atoms, KB attenuation for each atom */ 
if ((ftemp = acoord )) { 

if (!DB_CT_GET_CT_ATTR( ct, CtCtAtomCount, &natot )) goto error; 
} else if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &ftemp, &natot )) goto error; 

5 

#ifO 

AtWts = computeVdwWeights(ct, root - 1, -1, q_ReductionFactor, (int **) 0 ); 

#endif 

if ( SAtWts ) 
10 { 

AtWts = (double *) malloc( natot * sizeof(double) ); 
for ( nat = 0; nat < natot; nat+ 4- ) 

AtWts[nat] = 1.0; 
freeWeights =1; 

15 } 
#if0 

if (! (AtWts = TOP_FIELD_RB_WTS( ct, root, (set_ptr) 0 ) )) goto cleanup; 
for ( nat = 0; q_debugfp && ext_vdw_wt && nat < ct->atomCount; nat+ + ) 

{ 

20h fprintf(q_debugfp ,"# weights %d %8.31f %8.31f\n", nat+1, AtWtsfnat], 

ext_vdw_wt[nat] ); 

S } 

m #endif 
25r 

F /* compute VDW terms for each atom (not for each atom type as in SYBYL) */ 
S if (!(vAwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 

!~ if (!(vBwt = (double *) UTL_MEM_ALLOC( sizeof(double) * natot ))) goto cleanup; 

Q if (regp->box_array[0].atom_type != 1 1 1 regp->n_boxes != 1) 

3Qp tprintf( stderr, "WARNING: The C.3 probe atom type in a single box is alway used in 

□ the steric field calculation. \n" ); 
11 j for (nat=l; nat < = natot; nat + + ) 

a { 

M radnow = TOP_GET_ATOM_VDW_RADIUS( ct, nat, &epsnow ); 

35 radnow += RADIUSC3; 

epsnow = sqrt( epsnow * EPSILON C3 ); 

vAwt[ nat-1 ] = epsnow * 2.0 * pow( radnow, 6.0 ) * AtWts[ nat-1 ]; 
vBwt[ nat-1 ] = epsnow * pow( radnow, 12.0 ) * AtWts[ nat-1 ]; 

#if0 

40 if (ext_vdw_wt) { 

vAwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 
vBwt[ nat-1 ] *= ext_vdw_wt[ nat-1 ]; 

} 

#endif 
45 } 

/* empty output array */ 

/* Don't initialize with calloc, we set each field, waist of time, it really is. 
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A 38% speedup was performed by calling malloc vs calloc 

*/ 

nstepz = regp->box_array[0].nstep[2]; 
5 nstepy = regp->box_array[0].nstep[l]; 

nstepx = regp->box_array[0].nstep[0]; 

stepz = regp->box_array[0].stepsize[2]; 
stepy = regp->box_array[0].stepsize[l]; 
10 stepx = regp->box_array[0].stepsize[0]; 

npoints = nstepz * nstepy * nstepx; 

iowz = regp->box_array[0].lo[2]; 
15 lowy = regp->box_array[0].lo[l]; 

lowx = regp->box_array[0].lo[0]; 

max_steps = (int) (4,0 / stepx); 

if ( max_steps < = 0 1 1 ((double) max_steps * stepx ) < 4.0 ) 
2CU max_steps + = 1; 

m maxxSteps = max_ySteps = max zSteps = max steps * 2; 

m if ( maxxSteps > nstepx ) 
25jP maxxSteps = nstepx; 

P if ( max_ySteps > nstepy ) 
m maxySteps = nstepy; 

1~ if ( max_zSteps > nstepz ) 
□ maxzSteps = nstepz; 


3Q| 


35 


#if0 

); 

#endif 


lprintf(stderr/max steps: %d %d %d %d\n", max_steps, max xSteps, max_ySteps, maxzSteps 


#ifhdef NO_COMPRESSION 

if ( npoints > max alloc ) 

{ 

if ( Imax alloc ) \ 
40 max_alloc = 4000; 

while ( npoints > max_alloc ) 
max alloc *= 2; 

if ( ststeric ) 

45 free((char *) ststeric ); 

st_steric = (double *) malloc(sizeof (double) * max alloc ); 

} 

steric = ststeric;; 
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memset((char *) st_steric, '\0\ sizeof(double) * npoints ); 

#else 

steric = (double *) calloc( npoints, sizeof( double ) ); 

#endif 


st = steric; 


for ( nat = 0, coord = ftemp; 

nat < natot; 
nat 4- + ) 


{ 


va = *(vAwt + nat); 
vb = *(vBwt + nat); 
curr_x = *coord; 
curr_y = *(coord+l); 
currz = *(coord+2); 
coord +=3; 

iz = (int) ( fabs(curr_z - lowz 4- 0.5) / stepz); 
iy - (int) ( fabs(curr_y - lowy + 0.5) / stepy); 
ix = (int) ( fabs(curr_x - lowx + 0.5) / stepx); 

curriz = iz - maxsteps; 
currjy = iy - max_steps; 
currix = ix - max steps; 

currnstepsz = iz + maxsteps + 1; 
curr_nstepsy = iy + max steps 4- 1; 
curr_nstepsx = ix 4- max steps + 1; 


/* check boundary conditions, where the atom is near the outside of the region 

*/ 

if ( currjz < 0 ) 

curriz = 0; 
if ( curr_iy < 0 ) 

curr_iy = 0; 
if ( curr ix < 0 ) 

currjx = 0; 

/* Compute the fringe if outside the range */ 
if ( curr_iz > = nstepz ) 

curr iz = nstepz - 1; 
if ( currjy > = nstepy ) 

curr_iy - nstepy - 1; 
if ( curr ix > = nstepx ) 
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currjx = nstepx - 1; 

if ( currnstepsz > nstepz ) 

curr_nstepsz = nstepz; 

if ( curr_nstepsy > nstepy ) 

curr_nstepsy = nstepy; 

if ( currjistepsx > nstepx ) 

currjistepsx = nstepx; 

currjowz = lowz + (double) curr Jz * stepz; 
currjowy = lowy + (double) currjy * stepy; 
currjowx = lowx + (double) curr ix * stepx; 

maxw = STERIC_MAX * AtWts[ nat ]; 

#if 0 

fprintf(stderr,"xyz %6.11f %6.11f %6.11f low: %6.11f %6.11f %6.11f steps: %d %d %d clow: %6.11f 
%6.11f %6.11f idx: %d.%d %d ridx: %d %d %d csteps:%d %d %d\n\ 

currx, curr_y, curr_z, 

lowx, lowy, lowz, 

nstepx, nstepy, nstepz, 
; currjowx, currlowy, currjowz, 

curr ix, currjy, currjz, 

ix, iy, iz, 

curr_nstepsx, currnstepsy, currnstepsz ); 

#endif 


/* cycling over output array elements */ 

for ( iz=curr_iz, z= curr jowz; iz < curr nstepsz; iz+ + , z + - stepz ) 

{ 

zd = z - curr z; 
zd = zd*zd; 

for (iy= currjy, y=curr lowy; iy < currnstepsy; iy++, y += stepy ) 

{ 

yd = y - curr y; 
yd = yd*yd; 


#if 0 

#endif 
#ifO 


if ((zd+yd) > 49.0) 
continue; 

st = st_steric + ( (iz * nstepy * nstepx ) 4- (iy * nstepx ) + currjx); 

$>rintf(stderr,"base %d from %d %d %d (matrix: %d %d %d)\n", 
(iz * nstepy * nstepx ) + (iy * nstepx) + curr ix, 
curr ix, iy, iz, nstepx, nstepy, nstepz ); 
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#endif 
) 


#if 0 


#endif 


if ( !<iy%3)) 

sleep(l); 

for (ix=curr_ix, x=curr_lowx; ix < curr_nstepsx; ix+ + , x + = stepx 
{ 

sumsteric = *st; 

xd = x - curr_x; 

dis2 = xd*xd + yd + zd; 


if ( dis2 > 49.0 ) 
continue; 


if ( dis2 > = MIN SQ DISTANCE ) 
{ 

dis6 = dis2 * dis2 * dis2; 
disl2= dis6 * dis6 ; 
atm steric = vb/disl2 - va/dis6; 
if ( atm steric > maxw ) 
atmsteric = maxw; 

} 

else 

{ 


atm steric = maxw; 


} 


sum_stenc += atm_stenc; 
*st = sum steric > STERIC_MAX ? STERIC_MAX : sum steric; 
st+ + ; 
} 


#if 0 

for ( st = steric, iz = startEmpty = 0; iz < npoints && *st < 0.01 ; iz+ +, st+ + ) 
{ 

startEmpty+ + ; 

} 

for ( st = steric + (npoints -1), iz = npoints, endEmpty - 0; iz && *st < 0.01; iz--, st-- ) 

{ 

endEmpty + + ; 

} 

fprintf(stderr,"%d %d of %d %6.21f \n", 

startEmpty, endEmpty, npoints, ((double) (startEmpty + endEmpty)* 100. 0)/(double) 

npoints ); 
#endif 


123 


cleanup: 

if (AtWts && freeWeights) free ( (char*) AtWts ); 
if (vAwt) UTL_MEM_FREE( vAwt ); 
if (vBwt) UTL_MEM_FREE( vBwt ); 

error: 

return( steric ); 

} 

int TOP_STER_REGION_MODE(int regionMode ) 
{ 

if ( regionMode < 0 ) 

regionMode = 0; 
else if ( regionMode > 2 ) 

regionMode = 2; 

qjregionMode = regionMode; 


static int makeTopRegions(double stepSize, int numFrags) 

{ 

int i; 

l_ComfaRegion *r; 
lJBox *b; 
int nsteps; 

static double lastStepSize; 

static int printed; 

int intStep; 

int baseSteps = 5; 

int steps[3]; 

double fullMult; 

int maxtrixSize; 

int totalPoints; 

int bigseen = 0; 

double baseX, baseY, baseZ; 

int done; 

if ( lastStepSize = = stepSize ) 

return 0; 
lastStepSize = stepSize; 
baseX = -0.1; 
baseY = -6.0; 
baseZ = -4.0; 
totalPoints = 0; 

if ( qxmin 999.0 && qmode ) 
{ 

baseX = (double) ( (int) (qxmin - 1.0) ); 
baseY = (double) ( (int) (qymin - 1.0) ); 
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baseZ = (double) ( (int) (qzmin - 1.0) ); 
baseSteps = 0; 

steps[0] = (int) ((qxmax - baseX + 1.50) / stepSize) + 1; 
steps[l] = (int) ((qymax - baseY + 1.50) / stepSize) + 1; 
steps[2] = (int) ((qzmax - baseZ + 1.50) / stepSize) + 1; 
#ifdef TRIPOS_VERSION 

fprintf(stderr,"%6.21f %6.21f %6.21f, %6.21f %6.2\f %6.21f %d %d %d\n", 

qxmin, qymin, qzmin, qxmax, qymax, qzmax, steps[0], steps[l], steps[2] 

); 

#endif 

} 

else 

{ 

steps[0] = steps[l] = steps[2] = 5; 

} 

maxtrixSize = steps[0] * steps[l] * steps[2]; 
max_regions = NO_REGIONS; 

/* 

We have to limit the number of regions generated to conserve memory. 

If the initial region size to fit the query in is huge, then let's not 
create too many regions around it. 

*/ 

for ( i = bigseen = done = 0; !done && i < max regions; i + + ) 
{ 

if ( regions[i] ) 

free((char *) regions[i] ); 
, r = (l_RegionPtr) UTL_MEM_CALLOC(l,sizeofa_ComfaRegion)); 
r->n_boxes = 1; 
regions[i] = r; 
if ( r->box_array ) 

free((char *) r->box_array ); 
b = r->box_array = (I BoxPtr) UTL_MEM_CALLOC(l,sizeof(l_Box) ); 
b[0].atom_type = 1; 

b[0].stepsize[0] = b[0].stepsize[l] = b[0].stepsize[2] = stepSize; 

b[0].lo[0] = baseX; 
b[0].lo[l] = baseY; 
b[0].lo[2] = baseZ; 
b[0].nstep[0] = steps[0]; 
b[0].nstep[l] = stepsfl]; 
b[0].nstep[2] = steps[2]; 

#ifdef TRIPOS_VERSION 
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if ( Sprinted ) 

^ fprintf(stderr, M %d: steps: %d,%d,%d stepsize: %6.21f base: %6.21f %6.21f 

%6.21f\n", 

5 i, steps[0], steps[l], steps[2], stepSize, b[0].lo[0], b[0].lo[l], b[0].lo[2] 

); 

} 

#endif 

10 r->n_points = steps[0] * steps[l] * steps[2]; 

totalPoints += r->n_points; 

done = 0; 

if ( i > = 3 && steps[0] > 12 && steps[l] > 12 && steps[2] > 12 ) 
15 done = i+1; 

if ( r->n_points > 3000 1 1 totalPoints > 6000 ) 

^ if ( bigseen = = 0 && r-> n_points < 5000 && totalPoints < 10000 ) 

20 h { 
^ baseX -= stepSize; 

5 baseY -= stepSize; 

m baseZ -= stepSize; 

i|S steps[0] +=2; 

251= steps[l]+=2; 

T steps[2] +=2; 

bigseen = 1; 

r } • 

fi else 
30l { 


O } 

Li else 

35 { 


} 


done = i+1; 


if ( i < 4 ) 
{ 


steps[0] +=1 
steps[l] +=1 

40 steps[2] += 1 

if ( i % 2 ) 

{ 

baseZ -= stepSize; 
baseX -= stepSize; 

45 } 

else 

baseY -= stepSize; 
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} 

else 


if ( steps[0] < 13 ) 
{ 

steps[0] +=1; 
if ( !((i+4) % 4) ) 

baseX -= stepSize; 

} 

if ( steps[l] < 13 ) 
{ 

steps[l] += 1; 

if (0+2) % 3) 

baseY -= stepSize; 

} 

if ( steps[2] < 13 ) 
{ 

steps[2] +=1; 
if ( i % 2 ) 

baseZ -= stepSize; 

} 


} 


} 

} 

} 

if ( done && done < NO_REGIONS ) 

maxregions = done; 
printed = 1; 
return 1; 


l_RegionPtr TOP_MAKE_STD_REGION0 
/ 


/* creates a run-time description of the standard CoMFA region used for topomers 
source of region description is $DSERV_TB/rsh.rgn */ 


{ 


l_RegionPtr R; 

if (!(R = OLRegionPtr) UTL_MEM_CALLOC(l,sizeof(l_ComfaRegion)))) goto error; 
R->n boxes = 1; 

if (!(R->box_array = (lJBoxPtr) UTL_MEM_CALLOC(l,sizeof(l_Box)))) goto error; 

if ( q_regionMode = = 0 ) 
{ 

R->n_points = 1000; 
R->box_array[0].lo[0] = -4.0; 
R->box_array[0].lo[l] = -12.0; 
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return (IRegionPtr) 0; 

} 

double *TOP_FIELD_RB_WTS( struct CtConnectionTable *ct, int rootid, 


= = = = = = = = = = = = = = = = = = = = = = = = */ 

set_ptr a2use /* optionally, if not NIL, need to process only this set of atoms */ 
) 

10 /* constructs and returns weighting-by-rotatable-bond array for each atom */ 

{ 

/* pseudo code for FIELD RB WTSO 

while saw new atoms 
15 uncover atoms that stopped last shell growth 

grow next "rotational shell" 
while adding to shell 
for each atom in shell 
get neighbors not seen 
2Q ^ for each neighbor 

;ji if bond is rotatable (acyclic, > 1 attached atom, not =,am,#) 

m cover all other atoms attached to atom for this shell 

5* add it to shell 

5| */ 

25jg double *ansr = NIL, *vals = NIL, factor, nowfact = 1.0; 

j~ int nats, b, aggcount, atid, aggid, loop, size, inRing, natt, ntoats, toats[20]; 

m set_ptr aggats = NIL, allats = NIL, mils = NIL, endatms = NIL, endcands = NIL; 

~ CtBondTypeDef bType; 

30i; /* be sure rings were perceived */ 
g if (!DB_CT_UTL_FIND_RINGS( ct )) goto cleanup; 

O if (!DB_CTJ3E1^CT_ATTR( ct, CtCtAtomCount, &nats )) goto cleanup; 

35 /* output data allocations */ 

if (!( vals = (double*) UTL_MEM_ALLOC( sizeof(double)*nats))) goto cleanup; 

factor = aggreg_descale; 

if (I (allats = UTL_SET_CREATE( nats + 1 ) )) goto cleanup; 
40 if (!(aggats = UTL_SET_CREATE( nats + 1 ) )) goto cleanup; 

if (!(nuls = UTL_SET_CREATE( nats + 1 ) )) goto cleanup; 

if (!(endatms = UTL_SET_CREATE( nats + 1 ) )) goto cleanup; 

if (!(end_cands = UTL__SET_CREATE( nats + 1 ) )) goto cleanup; 

UTL_SET_INSERT( aggats, rootid ); 
45 UTL_SET_INSERT( allats, rootid ); 

aggcount = loop = 1; 

while (TRUE) { 
while (TRUE) { 
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aggid = -1; 

while ((aggid = UTL_SET_NEXT( allats, aggid )) > = 0 ) { 
/* put (acceptable) atoms attached to aggid into mils */ 
UTL_SET_CLEAR( nuls ); 
5 if (! (DB_CT_GET_ANY_ATOM_ATTR( ct, aggid, Ct AtomBondCount, &ntoats ) )) goto 

error; 

if (ntoats > 20) goto toomanyattms; 
if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, aggid, CtAtomBondToAtoms, &toats ) )) goto 

error; 

10 for (natt=0;natt<ntoats;natt+ +)if (!a2use j | UTL_SET_MEMBER(a2use,toats[natt])) 

UTL_SET_INSERT( nuls, toats[ natt ] ); 
/* remove atoms already processed from nuls */ 

UTL_SET_DIFF_INPLACE( nuls, allats, nuls ); 
UTL_SET_DIFF_INPLACE( nuls, endatms, nuls ); 
15 /* identifying any atoms that terminate this aggregate */ 
atid = -1; 

while ((atid = UTL_SET_NEXT( nuls, atid )) > = 0 ) { 
/* skipping monovalent atoms */ 

if (! (DB_CT_GET_ANY_ATOM_ATTR( ct, atid, CtAtomBondCount, &ntoats ) )) goto 

20h error; 
% if (ntoats > 1) { 

M if (!(b = DB_CT_UTL_GET_BONDID( ct, atid, aggid ) )) goto error; 

m if (!DB_CT_GET_BOND_ATTR( ct, b, CtBondlsInRing, &inRing) 

m | i !DB_CT_GET_BOND_ATTR( ct, b, CtBondType, &bType ) ) goto 

25.;= error; 

1= if (linRing && bType = = CtBondTypeSingle ) { 

m I* have an end-of-aggregate atom, mark as end atoms all other attached atoms */ 

T UTL_SET_CLEAR( end_cands ); 

Q if (!(DB_CT_GET_ANY_ATOM_ATTR( ct, atid, CtAtomBondCount, &ntoats 

3Qc ) )) goto error; 
£3 if (ntoats > 20) goto toomanyattms; 

flf if (! (DB_CT_GET_ANY_ATOM_ATTR( ct, atid, CtAtomBondToAtoms, &toats ) )) goto 

O error; 

U for(natt=0;natt<ntoats;natt++)if(!a2use 1 1 UTL_SET_MEMBER(a2use, toats[natt])) 

35 

UTL_SET_INSERT( end_cands, toats[ natt ] ); 
UTL_SET_DELETE( end_cands, aggid ); 
UTL_SET_OR_INPLACE( endatms, end cands, endatms ); 

} 

40 } 
} 

UTL_SET_OR_INPLACE( aggats, nuls, aggats ); 

} 

if (UTL_SET_CARDINALITY( aggats ) < = aggcount ) break; 
45 aggcount = UTL_SET_CARDINALITY( aggats ); 

UTL_SET_OR_INPLACE( allats, aggats, allats ); 

} 

/* debugging stuff .. */ 
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/* 

sprintf( tempString, "Aggregate %d (weight = %f ):", loop, nowfact ); 
UBS_OUTPUTJMESSAGE( stdout, tempString ); 
ashow( aggats, molp ); 
5 ashow( aggats, molp ); 

*/ 

/* if no atoms added, we are done! */ 

if (UTL_SET_EMPTY( aggats )) break; 
/* record scaling factor for atoms in this aggregate */ 
10 atid = -1; 

while ((atid = UTL_SET_NEXT( aggats, atid )) > = 0 ) { 
vals[ atid-1 ] = nowfact; 

} 

UTL_SETJ)R_INPLACE( allats, aggats, allats ); 
15 UTL_SET_CLEAR( aggats ); 

UTL_SET_CLEAR( endatms ); 
aggcount = 0; 
nowfact *= factor; 
loop+4-; 

2% 3 } 
% ansr = vals; 

?y cleanup: 

si error: 

25p if (aggats) UTL_SET_DESTROY( aggats ); 

i if (allats) UTL_SET_DESTROY( allats ); 

m if (endatms) UTL_SET_DESTROY( endatms ); 

T if (end_cands) UTL_SET_DESTROY( end_cands ); 

p if (nuls) UTL_SETJDESTROY( mils ); 
3Q |~ return( ansr ); 

m toomanyattms: 

i;i fprintf( stderr, "More than twenty atoms attached to some atom in this structure An" ); 
u goto error; 

35 } 

static char *fhex_field = NIL; 
static int fieldjength = 0; 

40 char *CT_FIELD2HEX( double *field, int size ) 


/* maps field to a hex string coarsely representing the field - 
45 caller must NOT free this string! */ 

{ 

char *f; 1 
int i, j, fd; 
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static double cutoff[16] = {9999., 0., 2., 4., 6., 8., 10., 12., 
14., 16., 18., 20., 22., 24., 26., 30. }; 

if ( size ! = field_length) { 

if (fhex_field) UTL_MEM_FREE( fhex_field ); 

if (!(fhex_field = UTL_MEM_ALLOC( sizeof( char) * (size+1) ) )) return NIL; 
fieldjength = size; 

} 

for (f = fhexfield, j = 0; j < size; j + + , f + + ) { 

for ( i = 1, fd = FALSE; i < 16; i+ + ) if (field[ j ] < = cutoff[ i ]) { 
fd = TRUE; 
break; 

} 

if(!fd){ 

fprintf( stderr, "Illegal steric field value set to missingAn" ); 
i = 0; 

} 

sprintf(f, "%.lx", i); , 

} 

*f = '\0'; 
return fhex_field; 

} \ 

double TOP_GET_ATOM_VDW_RADIUS( struct CtConnectionTable *ct, int nat, double *epsnow ) 


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =*/ 

/* hard coded to assign classical TAFF VDW properties */ 
{ 

int sybat; 
char *sybname; 
static double a_eps[34] = { 

0.000, 0.107, 0.107, 0.107, 0.107, 

0.095, 0.095, 0.095, 0.116, 0.116, 

0.095, 0.314, 0.095, 0.042, 0.434, 

0.314, 0.109, 0.623, 0.314, 0.095, 

0.000, 0.400, 0.400, 0.600, 0.400, 

0.100, 0.000, 0.042, 0.095, 0.314, 

0.314, 0.095, 0.116, 0.107 }; 

static double rval[34] = { 

0.000, 1.700, 1.700, 1.700, 1.700, 

1.550, 1.550, 1.550; 1.520, 1.520, 

1.800, 1.550, 1.800, 1.500, 1.850, 

1.750, 1.470, 1.980, 1.800, 1.550, 

0.000, 1.200, 1.200, 1.200, 1.200, 

1.341, 0.000, 2.100, 1.550, 1.800, 

1.800, 1.550, 1.520, 1.700 }; 


/* 5 - 9 */ 
/* 15 - 19 */ 


/* 5 -9*1 
I* 15 - 19 */ 
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if (!(DB_EX_ELEM TO_SYB ATOM_TYPE( ct, nat ? &sybname, &sybat ))) { 

Q>rintf( stderr, "Warning: Atom type not found for atom ID %d.\n\ nat ); 
*epsnow = 0.0; 
return 0.0; 

} 

if ( sybat < 0 j j sybat > 33 ) 
{ 

*epsnow = 0.0; 
return 0.0; 

} 

*epsnow = aeps [sybat] 
return rval[sybat]; 


#ifO 


switch (sybat) { 
/* c.3 */ 
/* c.2 */ 


case 1: 
case 2: 
case 3: /* car */ 
case 4: 
case 33 


/* c.l */ 
: /* c+ */ 


*epsnow = .107; return( 1.7 ); 


case 5: 
case 6: 
case 7: 
case 11 
case 19 
case 28 
case 31 

case 8: 
case 9: 
case 32 

case 10 
case 12 
case 18 
case 29 
case 30 


/* n.3 */ 
/* n.2 */ 
/* n.l */ 
: /* n.ar */ 
: /* n.lp3 */ 
: /* n.am */ 
: /* N+ */ 

*epsnow = .095; return( 1.55 ); 
/* 0.3 */ 
/* 0.2 */ 
: /* o.ar */ 

*epsnow - .116; return( 1.52 ); 
/* s.3 */ 
/* p.3 */ 
/* s.2 */ 
: /* S.O */ 
: /* s.o2 */ 

*epsnow 
/*H */ 


case 13: 

case 14: /* Br */ 
case 15: /* CI */ 


314; return( 1.8 ); 
epsnow = .042; return( 1.5 ); 
: epsnow = .434; return( 1.85 ); 
: epsnow = .314; return( 1.75 ); 
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case 16: /* 

F */ 



*epsnow = 

.109; return( 1.47 ); 

case 17: /* 

I*/ 



*epsnow = 

.623; return( 1.98 ); 

case 21: /* 

Na */ 


case 22: /* 

K*/ 


case 24: /* 

Li */ 



*epsnow = 

0.4; return(1.2 ); 

case 23: /* 

Ca */ 



*epsnow = 

0.6; return( 1.2 ); 

case 25: /* 

Al */ 



*epsnow = 

0.1; return( 1.341 ); 

case 27: /* 

Si */ 



*epsnow - 

0.042; return( 2.1 ); 

default: 




fyrintf( stderr, "WARNING: Assigning no steric field from atom type; %s\n\ sybname 

); 

*epsnow = 0.0; return( 0.0 ); 

} 

20h #endif 

51 int TOP_REFLECT_COO( double *coo, set_ptr atms, int npt, int *aplane ) 

in / 

25 1= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 

;g ==================================== */ 

m I* reflects atms through the plane defined by the atoms whose IDs are in aplane, by modifying values 
in coo */ 

o { 

Q double cent[3], eval[3], evec[3][3], mat[3][3], x, xsq, xy, xz, 
rlj y, ysq, yz, z, zsq, *cx, *cy, *cz, 1, m, n, d, *xyz, h; 

O int na, nrot, elem; 

35 /* Now perform the sums to determine the parameters of the plane */ 
/* equation. */ 
x = xsq = y = ysq = z = zsq = xy = xz = yz = 0.0; 
for (na = 0; na < npt; na+ + ) { 
cx = coo + 3 * ( aplane[ na ] - 1 ); 
40 x + = *cx; 

xsq + = (*cx) * (*cx); 
cy = cx 4- 1; 
y += *cy; 
ysq + = (*cy) * (*cy); 
45 cz = cy + 1; 

z += *cz; 

zsq + = (*cz) * (*cz); 

xy + = (*cx) * (*cy); 
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xz + = (*cx) * (*cz); 
yz + = (*cy) * (*cz); 

} 

cent[0] = x / (double) npt; 
5 cent[l] = y / (double) npt; 

cent[2] = z / (double) npt; 

mat[0][0] = xsq - x * cent[0]; 

mat[0][l] = xy - x * cent[l]; 
10 mat[0][2] = xz - x * cent[2]; 

mat[l][0] = xy - y * cent[0]; 

mat[l][l] = ysq - y * cent[l]; 

mat[l][2] = yz - y * cent[2]; 

mat[2][0] = xz - z * cent[0]; 
15 mat[2][l] = yz - z * cent[l]; 

mat[2][2] = zsq - z * cent[2]; 

/* calculate the plane */ 

if (IUTLGEOMSYMMEIGENSYS ((double *)mat, 3, eval, (double *) evec, &nrot)) goto error; 

20N 

J 1 = evec[0][0]; . 
m m = evec[l][0]; 
il n = evec[2][0]; 

ifl d = (1 * cent[0] + m * centfl] + n * cent[2]); 

25Lp 

jE /* perform reflection on the input coordinate sets */ 
go elem = -1; 

= " while ( (elem = UTL_SET_NEXT( atms, elem)) > = 0 ) { 

O xyz = coo + (elem - 1) * 3; 

3(iB h = 1 * xyz[0] + m * xyz[l] + n * xyz[2] - d; 

£3 xyz[0] -= 2.0 * 1 * h; 

fll xyz[l] -= 2.0 * m * h; 

Q xyz[2] -= 2.0 * n * h; 

H } 
35 

return TRUE; ; 
error: 
return FALSE; 

} 

40 

static int reflectAtoms( double *coo, int nAtoms, int npt, int *aplane ) 
/ 


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =*/ 

45 /* reflects atms through the plane defined by the atoms whose indexes (base 0 )are in aplane, by 
modifying values in coo */ 
{ 


135 


double cent[3], eval[3], evec[3][3], mat[3][3], x, xsq, xy, xz, 

y, ysq, yz, z, zsq, *cx, *cy, *cz, I, m, n, d, *xyz, h; 
int na, nrot, elem; 
int *dn; 

5 

if( npt > = 3 ) 

dn = findDirectionalNeighbors(g_ct, aplane[l], aplane[0], aplane[2] ); 

else 

return FALSE; 

10 

/* Now perform the sums to determine the parameters of the plane */ 
/* equation. */ 
x = xsq = y = ysq = z = zsq = xy = xz = yz = 0.0; 
for (na == 0; na < npt; na+ + ) { 
15 cx = coo 4- 3 * ( aplane[ na ] ); 

x + = *cx; 
xsq 4- = (*cx) * (*cx); 
cy = cx + 1; , 
y += *cy; ] 
20N ysq + = (*cy) * (*cy); 

^ cz = cy + 1; 

m z += *cz; 

m zsq + = (*cz) * (*cz); 

if! xy + = (*cx) * (*cy); 

25jz xz + = (*cx) * (*cz); 

h yz + = (*cy) * (*cz); 

I } 

, cent[0] = x / (double) npt; 

fi cent[l] = y / (double) npt; 

30p cent[2] = z / (double) npt; 

f[j mat[0][0] = xsq - x * cent[0]; 

a mat[0][l] = xy - x * cent[l]; 

H mat[0][2] = xz - x * cent[2]; 

35 mat|l][0] = xy - y * cent[0]; 

mat[l][l] = ysq - y * cent[l]; 

mat|l][2] = yz - y * cent[2]; 

mat[2][0] = xz - z * cent[0]; . 

matl2][l] = yz - z * cent[l]; 

40 matl2][2] = zsq - z * cent[2]; 

/* calculate the plane */ ' 
if (!UTL_GEOM_SYMM_EIGENSYS ((double *)mat, 3, eval, (double *) evec, &nrot)) goto error; 

45 1 = evec[0][0]; 

m = evec[l][0]; 
n = evec[2][0]; 

d = (1 * cent[0] + m * cent[l] + n * cent[2]); 
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/* perform reflection on the input coordinate sets */ 
elem = -1; 

for ( elem == 0; elem < nAtoms; elem+ + ) 

{ 

5 if ( dn[elem] < = 0 ) 

continue; 
xyz = coo + (elem * 3); 

h = 1 * xyz[0] + m * xyz[l] + n * xyz[2] - d; 
xyz[0] -= 2.0 * 1 * h; 
10 xyz[l] -= 2.0 * m * h; 

xyz[2] -= 2.0 * n * h; 

} 

if ( dn ) free((char *) dn ); 
15 return TRUE; 

error: 

if ( dn ) free((char *) dn ); 
return FALSE; 

} i 

20U 

static int setTorsion(double *coo, int nAtoms, int al, int a2, int a3, int a4, double value ) 
Jf /* rotates atoms to the value for the torsional angle defined by al,a2,a3 ? a4, by modifying values in coo 

E */ 

25 p double angle, delta, matrix[3][3]; 

C int elem; 
m i Rt *dn; 

r 6s 3 dn = findDirectionalNeighbors(g_ct, a3, a2, -1 ); 

3<Jp angle = UTLJ3EOM_TAU( coo+(al*3), coo+(a2*3), coo+(a3*3), coo-h(a4*3) ); 

S if (UTL_ERRORJS_SET0) UTL_ERROR_CLEAR0; 

fy if (angle < 0.0) angle + = 360.0; 

u while (value < 0.0) 
35 value += 360.0; 

while (value > 360.0) 
value -= 360.0; 

40 delta = angle - value; 

UTL_GEOM_MFORM( coo+(a2*3), coo+(a3*3), delta, matrix ); 
for ( elem = 0; elem < nAtoms; elemH- + ) 

{ 

if ( dn[elem] > 0 ) 

45 UTL_GEOM_ROTATE( coo+(a3*3), matrix, coo+(elem*3) ); 

} 

free((char *) dn ); 
return 1; 
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} 

static int setRootTorsion(double *coo, int nAtoms, int a2, int a3, int a4, double value ) 
/* rotates atoms to the value for the torsional angle defined by al,a2,a3,a4, by modifying values in coo 
5 */ 
{ 

double angle, delta, matrix[3][3]; 
double cordl[3]; 

double cord2[3]; 
10 int elem; 

cordl[0] = -1.802; 
cordl[l] = 1.666; 
cordl[2] = 0.0; 

15 

if ( q_coremode_align ) 

cord2[0] = -2.004; 

else 

cord2[0] = -0.504; 

20- 

1 cord2[l] = 1.424; 
g cord2[2] = 0.0; 

Si angle = UTL_GEOM_TAU( cord2, coo+(a2*3), coo+(a3*3), coo + (a4*3) ); 
25j if (UTL_ERROR_IS_SET0) UTL_ERROR_CLEAR0; 
i if (angle < 0.0) angle + = 360.0; 

while (value < 0.0) 
n value + = 360.0; 

30? 

Q while (value > 360.0) 
fly value -= 360.0; 

M= delta = angle - value; 
35 #ifdef DEBUGDETAIL 
if ( q_debugfp ) 

$rintf(q_debugfp, "# root value: %8.31f %6.01f %8.31f\n M , angle, value, delta ); 

#endif 

UTL_GEOM_MFORM( coo+(a2*3), coo+(a3*3), delta, matrix ); 
40 elem = -1; 

for ( elem = 0; elem < nAtoms; elem++ ) 

UTL_GEOM_ROTATE( coo+(a3*3), matrix, coo+(elem*3) ); 
return 1; 

} 

45 

static int setBaseTorsion(double *coo, int nAtoms, int a3, int a4, double value ) 

/* rotates atoms to the value for the torsional angle defined by al,a2,a3,a4, by modifying values in coo 

*/ 
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{ 

double angle, delta, matrix[3][3]; 
double cordl[3]; 

double cord2[3]; 
5 int elem; 

cordl[0] = -1.802; 
cordl[l] = 1.666; 
cordl[2] = 0.0; 
10 cord2[0] = -0.504; 
cord2[l] = 1.424; 
cord2[2] = 0.0; 

angle = UTL_GEOM_TAU( cordl, cord2, coo+(a3*3), coo+(a4*3) ); 
15 if (UTL_ERROR_IS_SET0) UTL_ERROR_CLEAR0; 

if (angle < 0.0) angle + = 360.0; 

> 

while (value < 0.0) 
value + = 360.0; f 
2Q== f 
" while (value > 360.0) 
m value -= 360.0; 

I A delta = angle - value; 

25p UTL_GEOM_MFORM( cord2, coo+(a3*3), delta, matrix ); 

j~ elem = -1; 

m for ( elem = 0; elem < nAtoms; elem+ + ) 
! r UTL_GEOM_ROTATE( coo+(a3*3), matrix, coo+(elem*3) ); 

r== return 1; 

30j= } 

flj int TOP_SET_TORSION( double *coo, set_ptr atms, int al, int a2, int a3, int a4, double value ) 

a i * 

35 = = = = = = = = == = = = = = = = = = = = = = = = = = = = = = = == = = = = */ 

/* rotates atms to the value for the torsional angle defined by al,a2,a3,a4, by modifying values in coo 

{ 

40 double angle, delta, matrix[3][3]; 
int elem; 

angle = UTL_GEOM_TAU( coo+(al-l)*3, coo+(a2-l)*3, coo+(a3-l)*3, coo+(a4-l)*3 ); 
if (UTL_ERROR_IS_SET0) goto error; 
45 if (angle < 0.0) angle + = 360.0; 

while (value < 0.0) 
value + = 360.0; 
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while (value > 360.0) 
value -= 360.0; 

delta = angle - value; 
5 UTL_GEOM_MFORM( coo+(a2-l)*3, coo+(a3-l)*3, delta, matrix ); 
elem = -1; 

while ((elem = UTL_SET_NEXT( atms, elem)) > 0) 
UTL_GEOM_ROTATE( coo+(a3-l)*3, matrix, coo+(elem-l)*3 ); 

10 return( TRUE ); 

error: 
return( FALSE ); 

} 

15 

int TOP_ALIGN_MOL( double *'coo, int natms, int al, int a2, int a3 ) 
/ 


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =*/ 

20h /* rotates and translates all coordinates so that al is at origin, a2 lies along x axis, and a3 lies in the xy 
"a plane */ 

I < 

m double matrix[3][3], tv[3], u[3], *c; 

|S int i, nc; 

J if (!UTL_GEOM_ALIGN(coo+(al-l)*3, coo + (a2-l)*3, coo+(al-l)*3, coo + (a3-l)*3, matrix)) goto 

On error; 

s if ( q_coremode_align ) 

□ c = coo+(a2-l)*3; 

30lp else 

Q c = coo+(al-l)*3; 

fij for (i = 0; i < 3; i+ + , C++) { 

Q u[i] = *c ; ; 

H tv[i] = -u[i]; 

35 } 

for (nc = 0, c = coo; nc < natms; nc+ + ) { 
UTL_GEOM_ROTATE( u, matrix, c); 
for (i = 0; i < 3; i+ + , C++) *c += tv[ i ]; 
40 } ! 

return TRUE; 
error: 

return FALSE; 

} 

45 /* New code Sept, 2000 */ 


/* 
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FindBreakPoints - takes in a ct and returns an array the size of the number 
of bonds in the ct. Each cell indicates true or false if this is a break point bond 


break points: are single bonds with at least N heavy atoms on each side of the 
5 attachment, not in a ring, and optionally they can be terminal atoms 

int minHev - optional argument which forces at least N hev atoms for this to 
be a breakpoint bond. 

10 

int termflag - if true the heavy atoms can be terminal heavy atoms, for example Fl, Br, CI 
Author: Rob Jilek Sept, 2000 
15 */ 

static Split *FindBreakPoints(CtConnectionTable *ct, int minHev, int termflag, int createFrags ) 
{ 

int *bdata; 
20U% int *singleBonds; 

S int *bptr; 

m CtBond *bondp; 

m int idx; 

m int *rbl, *rb2; 

25V int *atomMask: 

_p int hevCnt; 

m int hevDiff; 

r Split *S; 

3o| 

D int bent; 

rjj CtBondTypeDefbondType; 

p CtSimpleBondTypeDef simpleTypes; 

35 #ifdef DEBUG_VALK>_B 

fprintf(stdout, "new breakpoints minHev: %d Allow term: %s\n M , 
minHev, (termflag) ? "Yes" : "No" ); 

#endif 

40 S = (Split *) 0; 


if ( !ct 1 1 !ct->bondCount ) 
return S; 

45 atomMask = createAtomMask(ct, termflag, &hevCnt); 

if ( !q_coremode && qs && q_hevDiff > = 0 ) 
{ 
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hevDiff = absflievCnt - qs->numHev); 

if (hevDiff > q_hevDiff ) 

{ 

if ( createFrags ) 
5 t_filtered+ + ; 

free((char *) atomMask ); 
return S; 

} 

} 

10 if ( hevCnt < (minHev*2) ) 

{ 

free((char *) atomMask ); 
return S; 

} 

15 bdata = (int *) calloc(ct->bondCount, sizeof(int) ); 

singleBonds = (int *),calloc(ct->bondCount, sizeof(int) ); 
S = (Split *) calloc(l, sizeof(Split) ); 

for ( idx = 0, bondp = ct-> bonds; 
20-- idx < ct->bondCount; 

^ idx 4- 4- , bondp + + ) 

s { ; 

if ( ! ( bondp- > simpleBondType = = CtSimpleBondTypeSingle | j 
jS bondp- > simpleBondType == CtSimpleBondTypeNotSimple ) ) 

25r continue; /* must be single, check NotSimple next. */ 

m if ( bondp- > simpleBondType = = CtSimpleBondTypeNotSimple ) 

r { 

p bondType = DB_CT_GET_BOND_TYPE(ct, STD_ID(idx), &bcnt, 

30j= &simpleTypes ); 
□ if ( bondType ! = CtBondTypeSingle ) 

ri I continue; 

O } 

U if ( AB_IN_RING(bondp) ) 

35 continue; 

singleBonds[idx] j= 1; 

if (minHev > Q&fic !validBreakPoint(ct ? idx, atomMask, minHev, termflag, &rbl, &rb2 

)) 

40 continue; \ 

if ( createFrags ) , 

addSplit2(idx, rbl, rb2 ); 

else 

{ 

45 ftee((char *) rbl ); 

free((char *) rb2 ); 
S->s2cnt+ + ; 

} 
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bdata[idx] = 1; /* found a good one */ 


5 if ( createFrags && ( q_do3piece j | q_doSubset ) && hevCnt > = (minHev*3) ) 

makeSplit3(ct, atomMask, g_split2, gsplitcnt, minHev ); 

if ( createFrags ) 

S- > frags = createUniqFrags(ct- > atomCount, g_split2, gsplitcnt, g_split3, g_split3Cnt, 

10 atomMask, 

&(S->numFrags) ); 
S->numHev = hevCnt; 


15 #ifdef DEBUGVALIDBXX 

fprintf(stdout, "bonds (baseO): "); 
for ( idx = 0; 

idx < ct->bondCount; 
idxH- + ) 

20= { 
J if (bdata[idx] ) 

M fprintf(stdout,"%d ", idx ); 

ffi } 

m fprintfCstdout,"^"); 
25j #endif 

m if ( createFrags ) 

{ 

O S->s2 = g_split2; 

3ap S->s3 = g_split3; 

□ S->s2cnt = g_splitcnt; 

fiLf S->s3cnt = g_split3Cnt; 

35 S->bondCount = ct- > bondCount; 

S-> atomCount = ct-> atomCount; 

S->bondMask = bdata; 

S- > atomMask = atomMask; 

S->singleBonds = singleBonds; 
40 S->aromSets = (AromSet *) 0; 

g_spiit2 = (split2 *) 0; 
g_split3 = (split3 *) 0; 

g_splitcnt - g_splitalloc = g_split3Cnt = g_split3Alloc = 0; 


45 


return S; 
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static void freeSplit(Split *s) 

{ 

int i; 

AromSet *aset; 
if (Is) 

return; 

freeSplit2(s->s2, s->s2cnt); 
freeSplit3(s- > s3 , s- > s3cnt); 
freeFrags(s->ftags, s->numFrags); 
if ( s- > bondMask ) 

free((char *) s-> bondMask ); 
if ( s- > atomMask ) 

free((char *) s-> atomMask ); 
if ( s->singleBonds ) 

free((char *) s->singleBonds ); 

if ( s->featureMask ) 

free((char *) s->featureMask ); 
if ( s- > aromMask ) 

free((char *) s-> aromMask); 
if ( s->aromSets ) 

{ 

for ( i = 0, aset = s->aromSets; i < s->numArom; i+ + ? aset+H- ) 

free((char *) aset- > atoms); 
free((char *) s->aromSets ); 

} ; 

free((char *) s); 

} 

static void freeSplit2(split2 *s2 ? int cnt ) 

{ 

split2 *sptr; 
int i; 

if(!s2) 

return; 

for ( i = 0, sptr = s2; i k cnt; sptr+ + , i+ 4- ) 
{ 

free((char *) sptr->bl); 
free((char *) sptr->b2); 

} 

free((char *) s2); 1 

} 

static void freeSplit3(split3 *s3, m cnt ) 
{ 
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split3 *sptr; 
int i; 

if (!s3) 
5 return; 

for ( i = 0, sptr = s3; i < cnt; sptr+ +, i-f + ) 

{ 

free((char *) sptr->bl); 
10 free((char *) sptr- > b2); 

free((char *) sptr->b3); 
if (sptr->b4) 

free((char *) sptr->b4 ); 

} 

15 free((char *) s3); 

} 

static void freeFrags(Frag *f, int cnt ) 

{ 

20 n Frag *fptr; 

3 int 

=1 for ( i = 0, fptr = f; i < cnt; i+ + , fptr+ + ) 

{ 

25 r ; #ifdef USEHEX 

if (Q>tr->topHex ) 

free(fptr->topHex ); 
if (fptr->toplnt) 
fi free((char *) fptr- > toplnt ); 

3QE #endif 
O #ifdef STDREGION 
fiy if (fptr->stdField) 

p free((char *) fptr- > stdField ); 

U #endif 
35 if (fptr->hexDiff ) 

free((char *) fptr->hexDiff ); 
if (fptr->featureDiff ) 

free((char *) fptr->featureDiff); 
if (fptr->ct) 

40 DB_CT_DELETE_CT(fptr- > ct); 

else if ( fptr- > cords ) 

free((char *) fptr- > cords); /* if the ct exists, then coords is a pointer into the 

ct's coordinates */ 

if ( fptr->origMapping ) 
45 free((char *) fptr- > origMapping ); 

if ( fptr- > cent ) 

free((char *) fptr- > cent); 
if (fptr->AtWts) 
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i 


free((char *) fptr->AtWts ); 
for ( j = 0; j < max_regions; j + + ) 

{ 

if ( fptr->qtflj] && fptr->qtf[j] != fptr->topField ) 
. free((char *) fptr->qtf[j] ); 

} 

if (fptr->topField) 

free((char *) fptr->topField ); 

} 

free((char *) f ); 

} 

static void freeFragCts(Split *S) 
{ 

Frag *fptr; 
int i,j; 

double *coords; 

for ( i = 0, fptr = S->frags; i < S->numFrags; i+ + , fptr+ + ) 
{ 

if ( fptr- > ct && fptr- > cords ) 
{ 

coords = (double *) malloc(fptr->ct->atomCount * sizeof (double) * 3 ); 
memcpy((char *) coords, fptr- > cords, sizeof(double) * fptr->ct->atomCount 

*3); 

fptr- > cords = coords; 

DB_CT_DELETE_CT(fptr- > ct); 
fptr->ct = (struct CtConnectionTable *) 0; 

} 

} 

} 

static int freeStrMap(Split *S) 
{ 

split2 *s2; 
split3 *s3; 
int i; 

#ifdef NOSTRMAP 
return -1; 

#else 

if(!S) 

return 0; 

for ( i = 0, s2 = S->s2; i < S->s2cnt; i+ + , s2++ ) 
{ 

if (s2->strMap ) 
{ 

free((char *) s2->strMap ); 
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s2->strMap = (int *) 0; 

} 

} 

S->alloc2Map = 0; 

5 

for ( i = 0, s3 = S->s3; i < S->s3cnt; i+ + , s3++ ) 
{ 

if ( s3->strMap ) 

{ 

10 free((char *) s3-> strMap ); 

s3->strMap = (int *) 0; 

} 

} 

S->alloc3Map = 0; 

15 #endif 
} 

static int addSplit2(int bondld, int *bl, int *b2 ) 
{ 

20_ split2 *s; 

=2 if ( g_splitcnt > = gsplitalloc ) 

m { 

! S if ( g_split2 && g_splitalloc ) 

25 : {i { 

p g_split2 = (split2 *) realloc((char *) g_split2, g splitalloc * 2 * sizeof(split2) ); 

S gjsplitalloc *= 2; 

^ else 

3 4 { 

p gsplitalloc = 3; 

Si g_split2 - (split2 *) calloc(sizeof(split2), g splitalloc ); 

5 } 

a } 

35 s = g_split2 + gsplitcnt; 
s-> bondld = bondld; 
s->bl = bl; 

s->b2 = b2; 
#ifhdef NO_STRMAP 
40 s-> strMap = (int *) 0; 

#endif 

g_splitcnt+ + ; 

} 

45 static int printBondArray(int atomCnt, int *b) 
{ 

int i; 
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for ( i = 0; i < atomCnt; i+ + ) 
{ 

fprintf(stdout,"%2d H , b[i] ); 

} 

fprintf(stdout,"\n"); 

} 

static int addSplit3(int atomCnt, int bondl, int bond2, int *bl, int *b2, int *b3, int firstBase, int 
secondBase ) 

{ 

split3 *s; 

if ( g_split3Cnt > = g_split3Alloc) 
{ 

if ( g_split3 && g_split3AHoc ) 
{ 

g_split3 = (split3 *) realloc((char *) g_split3, g_split3Alloc * 2 * sizeof(split3) 

); 

g_split3 Alloc *= 2; 

} , 

else 

{ 

g_split3Alloc = 2; 

g_split3 = (split3 *) calloc(sizeof(split3), g_split3Alloc ); 

} 

} 

s = gsplit3 + g_sp!it3Cnt; 
s->bondl = bondl; 
s->bond2 = bond2; 
#ifhdef NO_STRMAP 

s->strMap = (int *) 0; 

#endif 

s->bl = (int *) malloc(sizeof(int) * atomCnt ); 
s->b2 = (int *) ma!loc(sizeof(int) * atomCnt ); 
s->b3 = (int *) ma!loc(sizeof(int) * atomCnt ); 
memcpy((char *) s->bl, (char *) bl, sizeof(int) * atomCnt ); 
memcpy((char *) s->b2, (char *) b2, sizeof(int) * atomCnt ); 
memcpy((char *) s->b3, (char *) b3, sizeof(int) * atomCnt ); 

s->b4 = (int *) malloc(sizeof(int) * atomCnt ); 
memcpy((char *) s->b4, !(char *) bl, sizeof(int) * atomCnt ); 
if ( firstBase > = 0 && secondBase > = 0 ) 
{ 

s->b4[firstBase] = 1; 

s->b4[secondBase] = -1; /* this is the base for query */ 

} 

g_split3Cnt+ + ; 
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} 

/* returns a true value if the atom arrays overlap and the anchor is contained 
within bl. It returns the index + 1 (base 1) indexed into bl 

5 */ 

static int atomsOverlap(int atomcnt, int *bl, int *b2) 

{ 

int i; 

int overlap = 0; 

10 

for ( i = 0; i < atomcnt; i+ + ) 

{ 

if (bl[i] == 1 &&b2[i] ) 
return i+1; 

15 } 

return 0; * 

} 

static Frag *createUniqFrags(int atomCnt, split2 *s2, int nums2, split3 *s3, int nums3, int *atomMask, 
20 int *r_numFrags ) 

S { ; 

m int i; 

[J1 split2 *s2ptr; , 

split3 *s3ptr; 
25>|= Frag *fragHead; 

P int no2p; 

I gJragHead = (Frag *) 0; 

**s g_fragCnt = gfragAlloc = 0; 
3tf? 

g if ( q_coremode = = 0 ) , 

ry gjrag Alloc = (nums2*2) + (nums3*2); 

£3 e ' se 

1,1, g_frag Alloc = nums3*2; 

35 

if ( g_fragAlloc > 0 ) 

gJfragHead = (Frag *) calloc(sizeof(Frag) ? g fragAlloc ); 


40 no2p = 0; 

if ( !q_coremode 1 1 qmode ) 

{ 

for ( i = 0, s2ptr = s2; i < nums2; i+ +, s2ptr++ ) 
{ 

45 s2ptr->fragl = createFrag(atomCnt, s2ptr->bl, atomMask, 0 ); 

s2ptr->frag2 = createFrag(atomCnt, s2ptr->b2, atomMask, 0 ); 

} 

} 
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no2p = gfragCnt; 

if ( q_coremode - = 0 ) 

{ 

for ( i = 0, s3ptr = s3; i < nums3; i+ +, s3ptr-h + ) 

{ 

s3ptr->fragl = createFrag(atomCnt, s3ptr->bl, atomMask, 0 ); 
s3ptr->frag2 = createFrag(atomCnt, s3ptr->b2, atomMask, 1 ); 
s3ptr->frag3 - createFrag(atomCnt, s3ptr->b3, atomMask, 1 ); 
s3ptr->frag4 = createFrag(atomCnt, s3ptr->b4, atomMask, 0 ); 

} 

} 

else 

{ 

for ( i = 0, s3ptr = s3; i < nums3; i+ + , s3ptr++ ) 
{ 

s3ptr->fragl = createFrag(atomCnt, s3ptr->bl, atomMask, 0); /*bl andb4 

are the center pieces */ 

s3ptr->frag2 = createFrag(atomCnt, s3ptr->b4, atomMask, 0 ); 

} 

} 

if ( q_debugfp ) 

fyrintf(qjlebugfy, "# There are %d uniq 2D fragments and %d 3D\n", no2p, g fragCnt 

- no2p ); 

totfrags 4- = nums2 * 2 !+ nums3 * 3; 
tot_uniq_frags + = g_fragCnt; 
compounds + + ; 

fragHead = gfragHead; 
*r_numFrags = gfragCnt; 

g fragHead = (Frag *) 0; 
gJragCnt = gfragAlloc = 0; 

return fragHead; 


int dump_frag_stats(void) 
{ 

fprintf(stderr,"AVG uniq frags: %8.31f AVG frags: %8.31f # structures for which fragments were 
built : %d\n", 

(double) ((double) totuniqjrags / (double) compounds), 
(double) ((double) tot_frags / (double) compounds), 
compounds); 


static int masksMatch(int cnt, int fml, int *m2 ) 
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{ 

int re; 


rc = !memcmp((char *) ml, (char *) m2, sizeof(int) * cnt ); 
5 return rc; 

} 

static int createFrag(int atomCnt, int *atoms, int *atomMask, int checkDup ) 

{ 

10 int i, j, found; 

Frag *curr; 

int numAtoms, hev Atoms; 
int baseAtom; 

15 hevAtoms = hevCount(atomCnt, atoms, atomMask, &numAtoms ); 

for ( i = 0, baseAtom = -1; i < atomCnt; i+ + ) 

{ 

if ( atoms [i] = = -1 ) 

{ ; 

2CL baseAtom^ = i; 

~? break; 

Si ! 

m ) 

I ;S if ( baseAtom = = -1 ) 

j- fprintf(stderr,"base atom not found\n" ); 

X for ( i = 0; i < atomCnt; i+ + ) 
r fprintf(stderr,"%d atomsp] ); 

Q fyrintfCstderr/'W'); 

30 p return -1; 

1 } 

if ( checkDup ) 

5 { 

35 for ( j = 0, curr = g_fragHead; j < gfragCnt; j + + , curr+ + ) 

{ 

if ( curr- > baseAtom = = baseAtom && curr- > atomCnt = = numAtoms && 
curr- > hevCnt = = hevAtoms && masksMatch(atomCnt, curr- > atoms, 

atoms) ) 

40 { 

return curr- > id; 

} 

45 if ( g fragCnt > = gfragAlloc ) 

{ 

#if0 

fprintf(stderr,"%d %d\n", g_fragCnt, g_fragAlloc ); 
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#endif 


); 


fflush(stderr); 

if ( g_fragHead && gJragAlloc ) 
{ 

g_frag Alloc *= 2; 

gJragHead = (Frag *) realloc((char *) g fragHead, g fragAlloc * sizeof(Frag) 


} 

else 

10 { 


gJragAlloc = 20; 

gfragHead = (Frag *) calloc(sizeof(Frag), g_fragAlloc ); 


} 
} 

15 curr = gfragHead + gfragCnt; 

memset((char *) curr, '\0\ sizeof(Frag) ); 

curr- > baseAtom = baseAtom; 

curr->atomCnt = numAtoms; 
2CL curr->hevCnt = hev Atoms; 

% curr- > atoms = atoms; 

% curr- > id = gJragCnt; 

JJ! curr->aromCnt = -1; ^ /* Indicate not computed */ 

lill g_fragCnt+ + ; 

21 return curr- > id; 

f > 

3011 static int hevCount(int atomcnt, iiit *b, int *atomMask, int *r_numAtoms ) 

5 { 

h s int hevCnt; 

"m int numAtoms; 

lI int i; 

3$ 

for ( i = hevCnt = numAtoms = 0; i < atomcnt; i + + ) 

{ ; 

if(b[i]) 
{ 

40 numAtoms++; 

if ( atomMask[i] ) 
hevCnt+ + ; 

} 

} 

45 *r_numAtoms = numAtoms; 

return hevCnt; 

} i 
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static int makeSplit3(CtConnectionTable *ct, int *atomMask, split2 *sall, int cnt, int minHev ) 
{ 

int i, j, k; . 

split2 *sl, *s2; 

CtBond *bl, *b2; 

int *inBoth; 

int *subsetl; 

int *subset2; * 

int *subset3; 

int ^remaining; 

int overlap 1, overlap2; 

int numAtoms; 

int numHev; 

int firstBase, secondBase; 

for (i = 0; i < cnt; i++ ) 
{ 

si = sail + i; s 

bl = ct- > bonds sl->bondId; 

for (j = i + 1; j < cnt; j + + ) 

{ 

s2 = sail 4* j; 

b2 = ct-> bonds + s2->bondId; 
firstBase = secondBase = -1; 

overlap 1 = atomsOverlap(ct- > atomCount, sl->bl, s2->bl); 
overlap2 = atomsOverlap(ct- > atomCount, sl->b2, s2->bl); 
if ( loverlapl | | !overlap2 ) 

{ 

overlapl = atomsOverlap(ct- > atomCount, sl->bl, s2->b2); 
overlap2 = atomsOverlap(ct- > atomCount, sl->b2, s2->b2); 
if ( loverlapl j j !overlap2 ) 

continue; 

} 

inBoth = s2->b2; 
subset3 = s2->bl; 

} 

else 
{ 

inBoth = s2->bl; 
subset3 = s2->b2; 

} 

if ( inBoth[overlapl - 1] < inBoth [overlap2 -1] ) 

subset2 = sl->b2; 
remaining = sl->bl; 

} 
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else 

{ 

subset2 = sl->bl; 
remaining = sl->b2; 

} 

subsetl = (int *) calloc(sizeof(int), ct->atomCount ); 

#ifdef SPLITDEBUG 

if ( q_debugfp ) 

fprintf(q_debugfp,"# "); 

#endif 

for ( k = 0; k < ct- > atomCount; k++ ) 
{ 

if ( remainingfk] && inBoth[k] ) 
{ 

subsetl [k] = remaining[k]; 
if (inBothfk] == -1 ) 

secondBase = k; 
if ( remaining[k] = = -1 ) 

firstBase = k; 

}i 

#ifdef SPLIT_DEBUG . 

if ( q_debugip ) 

fbrintf(q_debugfb,"%d subsetl [k] ); 

#endif 

} 

#ifdef SPLITDEBUG 

if ( q_debugfp ) 
{ 

fprintf(q_debugfp,"\n"); 

for ( k = 0; k < ct-> atomCount; k++ ) 

{, 

if (inBoth[k] = = -1 ) 

fprintf(q_debugfp, "# inBoth: %d\n", k ); 

} 

} 


#endif 


#if 0 


numHev = hevCount(ct-> atomCount, subsetl, atomMask, &numAtoms); 
numHev -= 2; /* subtract out the attachment atoms */ 
if ( numHev < minHev ) 

{ 

free((char *) subsetl); 
continue; 

} 

fprintf(stdout, "3 piece set\n"); 
printBond Array (ct- > atomCount, s 1 - > b 1 ) ; 
printBond Array (ct- > atomCount, s 1 - > b2) ; 
printBond Array (ct- > atomCount, s2- > b 1 ) ; 
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printBondArray(ct- > atomCount, s2- > b2); 
printBondArray(ct-> atomCount, subsetl); 
printBondArray(ct-> atomCount, subset2); 
printBondArray(ct-> atomCount, subset3); 

fyrintf(stdout," \n"); 

#endif 

addSplit3(ct-> atomCount, sl->bondId, s2->bondId, subsetl, subset2, subset3, 

firstBase, secondBase ); 

free((char *) subsetl ); 

} 

} 

return g_split3Cnt; 

} 

static int *findDirectionalNeighbors(CtConnectionTable *ct, int atomldx, int terminalAtomldx, int 

termldx2 ) 

/* 

think of the arguments as: ct, to, from 
or 

from the atom (atomldx) find atoms down the paths except for the terminal atoms 

For example: C is the atom your interested in, 
and you want to find the atoms going down the paths connected to atoms 3 and 4, so you block 1 and 
2 as terminal. 



CtAtom *A; 

CtAtomBondData *bond; 

int *covered; 

int added; 

int level; 

int toAtom; 

int i, j; 


if ( atomldx < 0 j j atomldx > = ct-> atomCount ) 

return (int *) 0; 
if ( terminalAtomldx > = ct-> atomCount ) 

return (int *) 0; 
if ( termldx2 > = ct-> atomCount ) 

return (int *) 0; 
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A = ct-> atoms + atomldx; /* index is zero based */ 

covered = (int *) cal!oc(ct- > atomCount , sizeof(int) ); 

covered[atomIdx] = 1; 

if ( terminal Atomldx > = 0 ) 

covered[terminal Atomldx] = -1; /* -1 means do not cross this atom, it is the 
anchor/terminal atom */ 

if ( termldx2 > = 0 ) 

covered[termIdx2] = -1; /* 4 means do not cross this atom, it is the 
anchor/terminal atom */ 

added = 1; 

for ( level - 1; added && level < = ct-> atomCount; level ++ ) 
{ 

for ( i = added = 0; i < ct-> atomCount; i-h-h ) 

{ 

if ( coveredfi] = = level ) 

{ 

A = ct-> atoms + i; 

for (j = 0, bond = A- > bond; j < A->bondCount; j+ +, bond+ + 

) 

{ 

to Atom = bond->toAtom; 
if ( covered[ toAtom ] ) 

continue; 
covered [to Atom] = level + 1; 
added+ + ; 

} 

} 

} 

} 

return covered; 

} 

static double *computeVdwWeights(CtConnectionTable *ct, int atomldx, int terminalAtomldx, double 

reductionFactor, int **r_covered ) 

/* 

see findDirectionalNeighbors for description. Same thing, only modified for weights 

*/ 
{ 

CtAtom *A; 

CtAtomBondData *bond; 

CtBond *bptr; 

int *covered; 

int added; 

int level; 

int toAtom; 

int i, j; 
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Split *S; 
int *bsplit; 
int bondldx; 
double *v_weight; 

double *r_weight; /* reference weight, so anchor atoms are included in next aggregate */ 

v_weight = (double *) calloc(ct-> atomCount, sizeof(double) ); 
for ( i = 0; i < ct->atomCount; i+4- ) 

v_weight[i] = 1.0; 
if ( recovered ) 

*r_covered = (int *) 0; 

if ( atomldx < 0 j | atomldx > = ct->atomCount 1 1 reductionFactor = = 1.0 ) 

return vjveight; 
if ( terminal Atomldx > = ct-> atomCount ) 

return vweight; 
S = FindBreakPoints(ct, 2, 1, 0 ); 

if (!S |j S->s2cnt ==0) 
{ 

if (S ) 

freeSplit(S); 
return v weight; 

} 

bsplit = S->bondMask; 

rjweight = (double *) calloc(ct-> atomCount, sizeof(double) ); 
for ( i = 0; i < ct-> atomCount; i++ ) 
r_weight[i] = 1.0; 

A = ct-> atoms + atomldx; /* index is zero based */ 

covered = (int *) calloc(ct-> atomCount, sizeof(int) ); 

covered[atomIdx] = 1; 

if ( terminalAtomldx > = 0 ) 

covered[terminalAtomIdx] = -1; /* -1 means do not cross this atom, it is the 
anchor/terminal atom */ 

added = 1; 

for ( level = 1; added && level < = ct-> atomCount; level + + ) 
{ 

for ( i = added = 0; i < ct- > atomCount; i++ ) 
{ 

if ( coveredfi] = = level ) 
{ 

A = ct-> atoms + i; 

for ( j = 0, bond = A- > bond; j < A->bondCount; j + + , bond+ + 
{ 
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to Atom = bond- > to Atom; 
if ( covered[ toAtom ] ) 
continue; 

bondldx = bond->ptr - ct-> bonds; 
5 if ( bsplit[bondIdx] ) 

r_weight[toAtom] - r_weight[i] * reductionFactor; 

else 

r_weight[toAtom] = r_weight[i]; 
v_weight[toAtom] = r_weight[i]; 

10 

covered[toAtom] = level + 1; 
added+ + ; 

} 

} 

15 } 
} 

free((char *) r_weight ); 
freeSplit(S); 
if ( r_covered ) 
2CL *r_covered = covered; 

**? else 

free((char *) covered); 
21 for ( i = 0; i < ct->atomCount; i+ + ) 

m * 

25p if ( v_weight[i] < 0.6 ) /* minimum atom weight */ 

£ v_weight[i] = 0.6; 

J } 

return v_weight; 

h } 

301= 

K int TOP_HEV_COUNT(struct CtConnectionTable *ct) 

S { 

□ CtAtom *atomp; 

y, int i; 

35 int hevCount; 

for ( i = hevCount = 0, atomp = ct-> atoms; i < ct->atomCount; i+ + , atomp + + ) 

{ 

if ( atomp- > class ! = CtAtomElement ) 
40 continue; 

if (atomp- >id.atomicNumber != HYDROGEN ) 
hevCount ++; 

} 

return hevCount; 

45 } 

static int *createAtomMask(CtConnectionTable *ct, int termflag, int *r_hevCount) 
{ 
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int *atomMask; 
CtAtom *atomp; 
int i; 

int hevCount; 

5 

atomMask = (int *) calloc(ct->atomCount, sizeof(int) ); 

for ( i = hevCount = 0, atomp = ct-> atoms; i < ct->atomCount; i+ + , atomp -h-h ) 
{ 

10 if ( atomp- > class ! = CtAtomElement ) 

continue; 

if ( atomp- >id.atomicNumber == HYDROGEN ) 
continue; 

hevCount + + ; /* count hev if terminal or not */ 
15 if ( kermflag && atomp- > bondCount = = 1 ) 

continue; /* don't count the terminal atoms */ 

atomMask[i] = 1; 

} 

20 ^ *r_hevCount = hevCount; 

y z return (atomMask); 

1 } 

25^ for a bond in a ct determine if by splitting this bond the two remaining pieces, 

p contain at least N minimum number of heavy atoms (variable minHev). The terminal flag if 
^ set to true count's terminal atoms, otherwise when false terminal atoms are not 
H " counted even if they are heavy atoms. 

30]^ Two arrays are returned the size of ct-> atomCount, a three way indicator is set for 

each atom in each set. 

0: atom is not in set 
1: atom is in set: 

-1: atom is the anchor atom in the set. 



40 static int validBreakPoint(CtConnectionTable *ct, int bondidx, int *atomMask, int minHev, int termflag, 
int**rbl,int**rb2) 

{ 

CtBond *bondp; 
CtAtom *atomp; 
45 int *dl, *d2; 

int dlhevcnt, d2hevcnt; 
int termPassed; 
int i; 
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#ifdef DEBUG_VALID_B 
int dlcnt, d2cnt; 

#endif 

bondp = ct-> bonds + bondidx; 

atomp = ct-> atoms + bondp- >atoms[0]; 

if (atomp-> class != CtAtomElement 1 1 atomp- > id.atomicNumber == HYDROGEN) 
return 0; 

atomp = ct-> atoms + bondp->atoms[l]; 

if (atomp- > class != CtAtomElement || atomp- > id.atomicNumber == HYDROGEN) 
return 0; 

dl = findDirectionalNeighbors(ct, bondp- >atoms[0], bondp- >atoms[l], -1 ); 
d2 = fmdDirectionalNeighbors(ct, bondp- >atoms[l], bondp- >atoms[0], -1 ); 

#ifdef DEBUG_VALID_B 
dlcnt = d2cnt = 0; 

ft)rintf(stdout,"atom set: %d %d\n", bondp- >atoms[0] + 1, bondp- >atoms[l] + 1 )• 
for ( i = 0; i < ct->atomCount; i++ ) 

if (dl[i] > 0) 

fprintf(stdout,"%d i+1 ); 

fprintf(stdout,"\n"); 

for ( i = 0; i < ct- > atomCount; i++ ) 
{ 

if ( d2[i] > 0 ) 

fprintf(stdout,"%d i+1 ); 

fprintf(stdout,"\n"); 

#endif 

for ( i = dlhevcnt = d2hevcnt = 0; i < ct-> atomCount; i + + ) 

#ifdef DEBUGVALIDB 

if ( dl[i] > 0) 

dlcnt++; 
if ( d2[i] > 0 ) 

d2cnt+ + ; 

#endif 

if ( atomMaskp] ) 
{ 

if(dl[i] > 0) 

dlhevcnt+ + ; 
if ( d2[i] > 0) 
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d2hevcnt++; 

} 

} 

5 #ifdef DEBUGVALIDB 

fprintf(stdout,"%d of %d and %d of %d \n", 
dlhevcnt, dlcnt, d2hevcnt, d2cnt ); 

#endif 

10 if ( dlhevcnt < minHev 1 1 d2hevcnt < minHev ) 

{ 

*rbl = (int *) 0; 
*rb2 = (int *) 0; 
free(dl); 

15 free(d2); 

return 0; 

} 

*rbl = dl; 
*rb2 = d2; 
20 return 1; 

O } 

CO static int BuildFrags(Split *S) 

25H inti, j; 

^ Frag *curr; 

int *atoms; 
S3 int cnt; 

int atomCount; 
30M int *aptr; 

jr int atomsBaseldx = -1; 

int copyBaseldx; 
% * nt Ordering; 

^ int natms; 

3ff " double *coo; 

struct CtConnectionTable *ct; 

if (!S 1 1 !S->ct) 
{ 

40 fprintf(stderr, "Build frags has no ct to copy from \n"); 

return -1; 

} 

if (S->fragsBuilt) 
return 0; 

45 S->fragsBuilt = 1; 

ct = S->ct; 

atomCount = ct-> atomCount; 
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atoms = (int *) malloc( atomCount * sizeof(int) ); 

for ( i = 0, curr = S-> frags; i < S->numFrags; i+ + , curr-h-h ) 

{ 

if ( curr->ct ) 

{ 

continue; /* already built */ 

} 

memset((char *) atoms, '\0\ sizeof(int) * atomCount ); 
atomsBaseldx = -1; 

for ( j = cnt = 0, aptr = curr- > atoms; j < atomCount; j + + , aptr+4- ) 

{ 

if ( *aptr ) 
{ 

if (*aptr = = -1 ) 

atomsBaseldx = j; 
atoms[cnt] = j + 1; 
cnt+ + ; 

} 

} 

curr->ct = DB CT UTL COPY_CT(ct, cnt, atoms, &ordering, CtCopyKeepAllAttrs 

if ( !curr->ct ) 

continue; 
copyBaseldx = -1; 
for (j = 0;j < cnt; j + + ) 

{ 

if ( ordering[j] = = atomsBaseldx ) 
copyBaseldx = j; 

} 

curr->copyBaseAtom = copyBaseldx; 
if ( copyBaseldx = = -1 ) 
continue; 

curr->origMapping = (int *) malloc(sizeof(int) * cnt ); 

memcpy((char *) curr->origMapping, (char *) ordering, sizeof(int) * cnt ); 

DBCTUTLFINDRINGS (curr- > ct); 
UTLJERROR_CLEAR(); 

DB_CT_GET_CT_ATTR( curr->ct, CtCt3DCoordSet, &coo, &natms); 
curr- > cords = coo; 

top AlignCt(curr- > ct, curr- > copyBaseAtom, S- > featureMask, curr- > origMapping 
/* align compound occording to topomer rules - all trans */ 
if ( qmode ) 

getQueryExtents(curr-> cords, curr-> atomCnt); 

} 

if ( atoms ) 

free((char *) atoms ); 
return 0; 
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} 

static void getQueryExtents(double *coords, int atomCnt ) 

{ 

5 int i; 

double x,y,z; 

for ( i = 0; i < atomCnt; i+ + ) 
{ 

10 x = *coords; 

y = *(coords+l); 

z = *(coords-t-2); 

coords +=3; 

if ( x < qxmin ) 
15 qxmin = x; 

if ( x > qxmax ) 

qxmax = x; 

if ( y < qymin ) 

20 qy m * n = y; 

o if ( y > qymax ) 

iQ qymax = y; 

if ( z < qzmin ) 
25n qzmin = z; 

4S if ( z > qzmax ) 

qzmax = z; 

a } 
} 

3(0 

£ static int BuildTopomers(CtConnectionTable *ct, Split *S, Split *query) 

y { 

I;: int j; 

H Frag *curr; 

35^ int cnt; 

int atomCount; 

int *aptr; 

int al; 

int genHex; 
40 double outside; 

static IRegionPtr r; 

double *cf; 

double *cf2; 

char *hexStr; 
45 int *fragMask; 

split2 *qs2; 

split3 *qs3; 

split2 *s2; 
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split3 *s3; 
int topskip; 

if (IS || let) 

return -1; 
if ( IstdRegion ) 

stdRegion = TOP MAKE STD REGION (V 
UTLERRORCLEARO; 

if ( !q_matrixMode ) 

makeTopRegions(q_stepSize, S->numFrags); 

else 
{ 

regions[0] = stdRegion; 
maxregions = 1; 

} 


S->ct = ct; 
BuildFrags(S); 

genHex = 0; 
#ifdef USE_HEX 

genHex = 1; 

#endif 

if ( q_debugfp ) 

genHex = 1; 
firagMask =. (int *) 0; 
#ifhdef NOSTRMAP 
if ( query ) 

{ 

fragMask = (int *) calloc(S- > numFrags, sizeof(int) ); 

/* Find which fragments to actually build the topomer fields for, only those which 

features 

don't disqualify this fragment combination 

*/ 

if ( query- >s2 && S->s2 && q_do2piece && query- >alloc2Map ) 

for ( i = 0, qs2 = query- >s2; 

i < query- >s2cnt&& qs2->strMap; 
i+ + , qs2 + + ) 

{ 

for (j = 0; j < S->s2cnt; j++ ) 

if (qs2->strMap[j] ) 
{ 
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s2 = S->s2 + j; 
fragMask[ s2-> tragi ] = 1; 
fragMaskf s2->frag2 ] = 1; 

} 

} 

} 

} 

if ( query- >s3 && S->s3 && q_do3piece && query- >alIoc3Map ) 
{ 

for ( i = 0, qs3 = query->s3; qs3->strMap && i < query->s3cnt; + , 
{ 

for (j = 0; j < S->s3cnt; j + + ) 
{ 

if (qs3->strMap(j] ) 
{ 

s3 = S->s3 + j; 
fragMask[ s3->fragl ] = 1; 
fragMask[ s3->frag2 ] = 1; 
fragMask[s3->frag3 ] = 1; 
fragMask[ s3->frag4 ] = I; 

} 

} 

} 

} 

if ( query- >s2 && S->s3 && q_doSubset && query- > allocSubsetMap ) 
{ 

for ( i = 0, qs2 = query- > s2; qs2-> subsetMap && i < query- > s2cnt; i+ + , 
{ 

for ( j = 0; j < S->s3cnt; j + + ) 
{ 

if ( qs2->subsetMapO] ) 

{ 

s3 = S->s3 + j; 
fragMask[ s3->fragl 1 = 1; 
fragMaskf s3->frag2 ] = 1; 
fragMask[ s3->frag3 ] = 1; 
fragMaskt s3->frag4 ] = 1; 

} 

} 

} 

} 


= topskip = 0, curr = S-> frags; i < S->numFrags; i+ + , curr++ ) 
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if ( !curr->ct | j curr->copyBaseAtom == -1 j j !curr->cords ) 
continue; 

#ifO 

if ( q_coremode && !qmode && i%2 ) 
continue; 

#endif 

#ifiidef NO_STRMAP 

if ( fragMask && fragMask[i] = = 0 ) 
{ 

topskip+ + ; 
continue; 

} 

#endif 

if ( q_debugfp ) 
{ 

writeCopy(q_debugfp, curr->ct, i, -1, (searchCnt > 0 ) ? "TS_SID" : 

"TS_QID"); 

if ( debug2 ) 

writeCopy(debug2, curr->ct, i, -1, (searchCnt > 0 ) ? "TS_SID" : 

"TS_QID" ); 

} 

al = curr->copyBaseAtom; 
#ifdef DEBUG_DETAIL 

if ( q_debugfp ) 

{ 

fprintf(q_debugfp,"#frag: %d base: %d atomCnt:%d\n", 
i + 1 , a 1 + 1 , ct- > atomCount); 

} 

#endif 

curr->AtWts = computeVdwWeights(ct, al, -1, q^ReductionFactor, (int **) 0 ); 

if ( curr->id > = S->s2cnt ) 

minRegion = minRegion3P; 

else 

minRegion = minRegion2P; 
if ( Iqmode ) 

{ 

r = getRegionToUse(curr- > cords, curr- > atomCnt, &(curr- > regionldx), 

&(curr->npoints) ); 

curr- > outside = atomsOutside(curr-> cords, curr->atomCnt, r, curr-> AtWts, 
&(curr->outsidePenalty) ); 

curr->topField = TOP_STER_ATOM_EVAL_ALL_RB_ATTEN(curr- > ct, r, 

al + 1, 

curr- > cords, curr->AtWts ); 

#ifhdef NO_COMPRESSION 

cf = compressField(curr->topField, r->n joints ); 
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#endif 

#ifdef STD_REGION 

al + 1, 

#endif 

} 

else 

{ 

&(curr->npoints) ); 


curr->topField = cf; 

curr->stdField = TOP_STER_EVAL_ALL_RB_ATTEN(curr- > ct, stdRegion, 
curr-> cords, curr->AtWts ); 


#ifO 

regions[j], al + 1, 


#endif 

regions|j] ? al + 1, 
#ifhdef NO COMPRESSION 


r = getRegionToUse(curr- > cords, curr- > atomCnt, &(curr- > regionldx), 
if ( curr- > id > = S->s2cnt && curr->regionIdx > minRegion3P ) 

{ 

minRegion3P = curr- > regionldx; 

} 

if ( curr- > regionldx > minRegion2P ) 

minRegion2P = curr->regionIdx; 

else 

curr- > regionldx = minRegion2P; 

for ( j = 0; j < max regions; j+ + ) 
{ 

r = regionsO]; 

curr->qtf[j] = TOP_STER_EVAL_ALL_RB_ATTEN(curr-> ct ? 

curr- > cords, curr->AtWts ); 
compareFields(curr->qtf[j], cf, r->n joints ); 
cf2 = compressField(cf, r->n_points ); 
free((char *) cf2 ); 

curr->qtf[j] = TOP_STER_ATOM_EVAL_ALL_RB_ATTEN(ciirr- > ct, 
curr- > cords, curr->AtWts ); 


#endif 


); 


#ifdef STDREGION 
al + 1, 


cf = compressField(curr->qtf[j], r->n joints ); 
curr->qtf£j] = cf; 

} 

if ( !((i+l) % 10)) 

^)rintf(stderr,"Built Query fragments: %dof %d\n", i+ 1, S->numFrags 


curr->stdField = TOP_STER_EVAL_ALL_RB_ATTEN(curr- > ct, stdRegion, 
curr- > cords, curr->AtWts ); 
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#endif 

} 

if ( q_debugfp && !qmode && curr->topField ) 
{ 

/* curr->topHex */ 

cf = TOP_STER_EVAL_ALL_RB_ATTEN(curr-> ct, stdRegion, al + 1, 

curr-> cords, curr->AtWts ); 
hexStr = strdup(CT_FIELD2HEX(cf, stdRegion- >n_points)); 
^)rintf(q_debugfp, "# %s\n", hexStr ); 
#ifdef NO_COMPRESSION 

free((char *) cf); /* don't free the field with compression enabled 

*/ 

#endif 

free((char *) hexStr ); 

} 

} 

if ( fragMask ) 

free((char *) fragMask ); 

#if 0 

if ( topskip ) 

fprintf(stderr, "skipped building %d of %d topomers\n% topskip, S->numFrags ); 

#endif 

return 0; 

} 

static CtBond *getBond(struct CtConnectionTable *ct, int idl, int id2 ) 
{ 

int i; 

CtAtomBondData *abd; 
CtAtom *a; 

a = ct-> atoms + idl; 

for (i = 0, abd = a->bond; i < a->bondCount; i+ + , abd++ ) 

{ 

if (abd- > to Atom == id2) 
return abd->ptr; 

} 

return (CtBond *) 0; 

} 

/* 

Align the ct fragment according to topomer alignment rules, 
adjust all torsions to a trans position for all single bonds with 
non-terminal atoms and do reflection if needed for all prochiral atoms 
Rob Jilek: Nov. 2000 
*/ 
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static int topAlignCt(struct CtConnectionTable *ct, int baseAtom, int *featureMask, int *ctMapping ) 
{ 

int *atoms; 

int *atomDist; 

int *singleBonds; 

int *to Atoms; 

int *secChoice; 

double *molWeights; 

int i j; 

int idx; 

int status; 

int distance; 

CtAtom *atomp; 

CtAtomBondData *bi; 

CtBond *bondp; 

CtBondTypeDef bondType; 

CtSimpleBondTypeDef simpleTypes; 

int bent; 

int priority [4]; 

struct cipSupportDef ^support; 

int a0> al, a2, a3; 

int rbondsJoined; 

double *cord; 

double torsion; 

int dorefle; 

int mode; 

int hent, fent, clcnt, brent; 
int planeAtoms[3]; 

char *atomMessage[] = { "na M , "important", "chiraT }; 
double *tors; 

if (!DB_CT_GET_CT_ATTR( ct, CtCt3DCoordSet, &cord, &i)) 
return -1; 

g_ct = ct; 


singleBonds = (int *) calloc(sizeof(int), ct->bondCount ); 
atoms = (int *) calloc(sizeof(int), ct->atomCount ); 
tors = (double *) calloc(sizeof(double), ct- > atomCount ); 

for ( idx = 0, bondp = ct-> bonds; 

idx < ct->bondCount; 
idx+ + , bondp-h-H ) 

{ 

#define TOP_ALIGN_DOUBLE 
#ifdef TOP_ALIGN_DOUBLE 

if (bondp- >simpleBondType == CtSimpleBondTypeNotSimple ) 
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{ 

&simpleTypes ); 
)) 


} 

else 
{ 


bondType = DB CT GET BOND TYPE(ct, STDJD(idx), &bcnt, 
if ( !(bondType = = CtBondTypeSingle 1 1 bondType = = CtBondTypeDouble 
continue; 


if ( ! ( bondp->simpleBondType == CtSimpleBondTypeS ingle j 
bondp->simpleBondType == CtSimpleBondTypeDouble ) ) 

continue; /* must be single or double */ 

} 


#else 


1 1 


if ( ! (bondp->simpleBondType == CtSimpleBondTypeSingle ( j 

bondp->simpleBondType == CtSimpleBondTypeNotSimple ) ) 
continue; /* must be single, check NotSimple next. */ 

if ( bondp->simpleBondType = = CtSimpleBondTypeNotSimple ) 
{ 

bondType = DB_CT_GET_BOND_TYPE(ct, STDJD(idx), &bcnt, 

&simpleTypes ); 

if ( bondType != CtBondTypeSingle ) 
continue; 

} 


#endif 


#ifO 


if ( AB_IN_RING(bondp) ) 
continue; 

/* Jan, 16th 2000 - align the hydrogens and other terminal atoms 

/* if either atom attached to this bond is terminal, then ignore this bond 

atomp = ct-> atoms + bondp->atoms[0]; 
if ( atomp- >bondCount == 1 ) 
continue; 

atomp = ct-> atoms + bondp->atoms[l]; 
if ( atomp- >bondCount = = 1 ) 
continue; 


#endif 


/* We have a bond and the atoms we wish 


to adjust the torsions on */ 

singleBonds[idx] = 1; 

atoms[ bondp->atoms[0] ] - 1; 

atomsf bondp->atoms[l] ] = 1; 

} 

/* now add in the prochiral atoms */ 
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support = DB_CT_CHIRAL_CIP_SETUPO; 
for ( i = 1; i < = ct- > atomCount; i++ ) 

status = DB_CT_UTL_IS_CHIRAL_TYPE(ct, i, 1, 1, &hcnt, &fcnt, &clcnt, &brcnt V 
if (status == 0) 
continue; 
if ( status = = -1 ) 
{ 

UTLERRORCLEARO; 
continue; 

} 

status - DB_CT_CHIRAL GET_RS_PRIORITY(ct, i, priority, support ); 
if ( status = = 0 ) 
continue; 

atoms[i-l] =2; /* mark it differently that this is a prochiral atom */ 

DB_CT_CHIRAL_CIP_FREE(support); 
atomDist = findDirectionalNeighbors(ct, baseAtom, -1, -1 ); 

molWeights = computePathWeights(ct, baseAtom, atomDist, featureMask, ctMapping ); 
toAtoms = findLargestBranch(ct, atomDist, molWeights ); 
g_atomDist = atomDist; 

al = baseAtom; 
a2 = toAtoms[al]; 
a3 = toAtoms[a2]; 
if ( a3 = = -1) 

TOP_ALIGN_MOL(cord, ct- > atomCount, a 1 + 1 , a2 + 1 , a2 + 1); /* function want's base 

1 atom ids */ 
else 

TOP_ALIGN_MOL(cord, ct- > atomCount, al + 1 , a2 + 1 , a3 + 1); /* function want's base 

1 atom ids */ 


rbondsJoined = 0; 
bondp = getBond(ct, a2, a3); 
if ( bondp && AB_IN_RING(bondp) ) 
rbondsJoined + + ; 

torsion = 180.0; 
if ( rbondsJoined = = 1 ) 
torsion = 90.0; 

/* where al is baseAtom, a2 is toAtoms[al], etc */ 
if (a3 != -1) 

setRootTorsion(cord, ct-> atomCount, al, a2, a3, torsion ); 
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#ifdef DEBUG_DETAIL 
if ( q_debugfp ) 
{ 

rprintf(q_debugfp, "# root: fixed %d %d %d %6.01f\n", al, a2, a3, torsion ); 
5 for ( i = 0; i < ct->atomCount; i++ ) 

{ 

fprintf(Oebugrp> "# toAtom %d -> %d (%d %d) W, 

i, toAtoms[i], atomDist[i], ( toAtomsp] > = 0 ) ? atomDist[ toAtomsp] 

]:-!); 
10 } 

} 

#endif 

/* now adjust the torsion in atom distance order */ 

15 

for ( distance = 2; distance < = ct->atomCount; distanced- + ) 

{ 

for ( i = 0; i < ct->atomCount; i+ + ) 
20 { 

k f if ( atoms[i] = = 0 | j i = = baseAtom ) 

2j continue; /* not interested in this atom */ 

y if ( atomDistfi] ! = distance ) 

: ;f continue; /* we are not doing this distance from the base 

2*f j atom now */ 

% if ( atomsfi] ==2 && !getFromRingCount(ct, atomDist, i, toAtoms[i] ) ) /* 

3l a chiral atom */ 

w { 

"f^ /* we can NOT convert if either main chain bonds 

36p are in a ring */ 
K a0 = baseAtom; 

SI a2 = i; 

!*i getFromChiral Atoms(ct, atomDist, mol Weights, i, toAtoms[i] , &al , &a3 

i:; ); 

35 #ifdef DEBUG_DETAIL 

if ( q_debugfp ) 

fprintf(q_debugfp,' , # reflect torsion atoms: %d %d %d %d \n", 
aO, al, a2, a3 ); 

#endif 

40 if (a0 != -1 &&al != -1 && a2 != -1 && a3 != -1 && aO != al) 

{ 

torsion = UTL_GEOM_TAU( cord+(a0*3), cord+(al*3), 

cord+(a2*3), cord+(a3*3) ); 

UTL_ERROR_CLEAR0; 
45 if ( torsion < 0.0 ) 

torsion + = 360.0; 
mode = (atomDist[i] -1) % 2; 

#ifdef DEBUG_DETAIL 
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if ( q_debugfp ) 

fprintf(q_debugfp,"# reflect torsion: %d %d %d %d 

%6.01f mode:%d\n", 

aO, al, a2, a3, torsion, mode ): 

#endif 

#ifdef ALTERNATE CHIRAL 

if ( mode = = 1 && torsion > 180.0 I I mode = = 0 && torsion 

< 180.0 ) 

{ 

#endif 

planeAtoms[0] = al; 

planeAtoms[l] = i; 

planeAtoms[2] = toAtoms[i]; 

reflectAtoms(cord, ct->atomCount, 3, planeAtoms ); 

tors[i] = torsion * 100.0: 

#ifdef DEBUGDETAIL 

if ( q_debugfp ) 

fprintf(q_debugfp,"# reflected: %d %d %d\n", 
planeAtoms[0], planeAtoms[l], 

planeAtoms[2] ); 
#endif 

#ifdef ALTERNATECHIRAL 

} 

#endif 

} 

} 

al = i; 

atomp = ct-> atoms + i; 

for ( j = 0, bi = atomp- > bond; j < atomp- >bondCount; j + + , bi++ ) 

if ( atomDist[ bi->toAtom ] != (distance+1) ) 

continue; 
idx = bi->ptr - ct-> bonds; 

#ifdef DEBUGDETAIL 

if ( q_debugfp ) 

fprintf(q_debugfp, "# atominfo %d %d (%d %d) = %d\n", 
al + 1, bi->toAtom + 1, 
bi->ptr->refldx, idx, 
singleBonds[idx] ); 

#endif 

if ( singleBondsf idx ] = = 0 ) /* make sure rotatable bond */ 

continue; 
a2 = bi->toAtom; 

aO = getFromAtom(ct, atomDist, molWeights, i, a2, baseAtom, cord ); 

/* a2 = toAtoms[i]; */ 
a3 = toAtoms[a2]; 

if(a0==-l (j al ==-1 |! a2==-l j| a3==-l) 
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{ 

if ( qjlebugfp ) 

lprintf(q_debuglp, "# not aligned one or more of the atom 

ids is -1: %d %d %d %d\n", aO, al, a2, a3 ); 

continue; 

} 


/* count the number of ring bonds joined */ 
rbondsJoined = 0; 
bondp = getBond(ct, aO, al); 
if ( bondp && AB_INJRING(bondp) ) 

rbondsJoined -f + ; 


bondp = getBond(ct, a2, a3); 
if ( bondp && AB_IN_RING(bondp) ) 
rbondsJoined + +; 


torsion = 180.0; 

if ( rbondsJoined = = 1 ) 

torsion = 90.0; 
else if ( rbondsJoined = = 2 ) 

torsion = 60.0; 
setTorsion(cord, ct- > atomCount , aO, al, a2, a3, torsion ); 
torsfal] = torsion; 

#ifdef DEBUGJDETAIL 

if ( q_debugip ) 

fprintfCqjiebug^"* torsion: %d %d %d %d %6.01f\n\ aO, al, 

a2, a3, torsion ); 
#endif 

} 

} 

} 

#ifdef DEBUG_DETAIL 
if ( q_debugfp ) 

{ 

for ( i = 0; i < ct-> atomCount; i + + ) 
{ 

fprintf(q_debugfp,"# %2d: %2d %2d %8.21f %7.21f %s \n", 
i+1, atomDist[i], toAtoms[i], molWeights[i], tors[i], 
atomMessage[ atoms[i] ]); 

} 

} 

#endif 

free((char *) atomDist); 
free((char *) molWeights); 
free((char *) toAtoms ); 
free((char *) singleBonds); 
free((char *) atoms ); 
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} 


free((char *) tors ); 
return 0; 


static int getFromAtom(struct CtConnectionTable *ct, int *atomdist, double *molWeights int atom int 
toAtom, int baseAtom, double *cord ) 

int i; 

int bestb[4]; 
int nlowest; 
int nbest; 
double bestw; 
CtAtom *A; 
CtAtom *aptr; 
CtAtomBondData *abd; 
double tors[4]; 
double tlow; 


A = ct-> atoms + atom; 

if ( atomdist[atom] = = 1 ) 
return -1; 


/* otherwise it isn't the base atom */ 
bestw = -1.0; 

bestb[0] = bestb[l] = bestb[2] = bestb[3] = -1; 
nbest - 0; 

for ( i = nbest = 0, abd = A->bond; i < A- > bondCount; i++, abd+ + ) 
if ( atomdistf abd->toAtom ] = = ( atomdist[ atom ] - 1) ) 
if ( molWeights[ abd->toAtom ] > bestw ) 
nbest = 0; 

bestw = molWeights[ abd->toAtom ]; 
bestb[nbest] = abd->toAtom; 
nbest+ + ; 

} 

else if ( molWeights[ abd->toAtom ] = = bestw && nbest < 4 ) 

bestb[nbest] = abd->toAtom; 
nbest+ + ; 

} 
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} 

} 

if ( nbest > 1 ) 
{ 

/* must break the tie */ 
for ( i = nlowest = 0, tlow = 400.0; i < nbest; i++ ) 
{ 

tors[i] = UTL_GEOM_TAU(cord+ (baseAtom*3), cord + (atom*: 
cord+(toAtom*3), cord+(bestb[i]*3) ); 

while (tors[i] < 0.0 ) 

tors[i] + = 360.0; 
while (tors[i] > 360.0 ) 

tors[i] -= 360.0; 
UTL ERROR CLEARO; 

#if0 

if(tors[i] < 90.0) 

return bestb[i]; 

#endif 

if ( tors[i] < tlow ) 
{ 

nlowest = i; 
tlow = tors[i]; 

} 

} 

return bestb [nlowest]; 

} 

else if ( nbest = = 1 ) 
return bestb[0]; 
return -1; 

} 

static int getFromRingCount(struct CtConnectionTable *ct, int *atomdist, int atom, int toAtom ) 

int i; 
int rent; 
CtAtom *A; 
CtAtom *aptr; 
CtAtomBondData *abd; 

A = ct-> atoms + atom; 

if ( atomdist[atom] = = 1 ) 
return 0; 

/* otherwise it isn't the base atom */ 

for ( i = rent = 0, abd = A->bond; i < A->bondCount; i+ + , abd+ + ) 
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if ( atomdist[ abd->toAtom ] = = ( atomdist[ atom ] - 1) ) 
{ 

if ( AB_IN_RING(abd- > ptr) ) 
rcnt-f + ; 

} 

else if ( abd->toAtom = = to Atom && AB_IN_RING(abd- > ptr) ) 
rcnt+ + ; 

} 

#ifdef DEBUG_DETAIL 
if ( q_debugfp ) 

§>rintf(q_debugfp,"# atom:%d rcnt:%d\n", atom, rent ); 

#endif 

return rent; 

} 


static int getFromChiralAtoms(struct CtConnectionTable *ct, int *atomdist, double *molw, int atom, int 
toAtom, 

int *r_fromAtom, int *r_toatom) 

{ 

int i; 

int ids[2]; 
int weight[2]; 
int idx = 0; 
int rent; 
CtAtom *A; 
CtAtom *aptr; 
CtAtomBondData *abd; 
int t_toAtom, tjength; 
double theWeight; 

A = ct-> atoms + atom; 
*rJxomAtom = *r_toatom = -1; 

#ifdef DEBUGDETAIL 
if ( qjiebugfp ) 

fprint^qjiebugip, "# chiral atom: %d bondcount: %d toAtom:%d \n", 
atom, A->bondCount, to Atom ); 

#endif 

for ( i = rent = idx = 0, abd = A->bond; i < A->bondCount; i++, abd++ ) 
{ 

#ifdef DEBUG J3ET AIL 

if ( CL_debugfp ) 

fprintf(q_debugfp, "# atom: %d dist:%d toatom:%d dist:%d \n", 

atom, atomdist[atom], abd- > to Atom, atomdistf abd- > to Atom ] ); 

#endif 

if ( abd->toAtom = = toAtom j | idx > = 2 ) 
continue; 
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if ( atomdistf abd->toAtom ] < = ( atomdist[ atom ] - 1) ) 

*r_fromAtom = abd- > to Atom; 
continue; 

} 

ids[idx] = abd->toAtom; 
tJoAtom = tlength = -1; 
theWeight = -1.0; 

traverseBranch(ct, abd->toAtom, atomdist, molw, abd->toAtom, &t toAtom &t length 
&theWeight); ~ _ ' 

weightpdx] = theWeight; 
idx++; 

} 

if (idx ==2) 
{ 

if ( weight[0] > = weight[l] ) 
*r_toatom = ids[0]; 

else 

*r_toatom = ids[l]; 

else if (idx == 1 ) 

*r_toatom = ids[0]; 

static int getToAtoms( struct CtConnectionTable *ct, int *atomDist, double *molWeights int idx int 
*ratoml, int *ratom2 ) ' 

{ 

int i; 

int targetDistance; 
CtAtomBondData *abd; 
CtAtom *A; 
double bestw; 
int besta; 

A - ct-> atoms + idx; 
targetDistance = atomDistfidx] + 1; 
bestw = -1.0; 
besta = -1; 

*ratoml = *ratom2 = -1; 

for ( i = 0, abd = A->bond; i < A- > bondCount; i+ + , abd+-i- ) 
if ( atomDistf abd->toAtom ] = = targetDistance ) 
if ( molWeights[abd->toAtom ] > bestw ) 
bestw = molWeights[abd->toAtom]; 
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besta = abd->toAtom; 

} 

} 

} 

5 if (besta ===== -1 ) 

return -1; 
*ratoml = besta; 

A = ct-> atoms + besta; 
10 targetDistance = atomDist[besta] + 1; 

bestw = -K0; 
besta = -1; 

for ( i = 0, abd = A->bond; i < A->bondCount; i+ + , abd+ + ) 
15 { 

if ( atomDist[ abd- > to Atom ] = = targetDistance ) 

{ 

if ( molWeights[abd->toAtom ] > bestw ) 
{ 

20 bestw = molWeights[abd->toAtom]; 

£3 besta = abd- > toAtom; 

= 0 } 

* } 

T i } 

2*?j if (besta == -1) 

+; return -1; 

j= *ratom2 = besta; 


3fti } 


return 0; 


1* static double *computePathWeights(struct CtConnectionTable *ct, int baseAtom, int *atomDist, int 
1 1 *featureMask, int *ctMap ) 

S { 

35 int i,j,k; 

CtAtom *A; 

CtAtom *aptr; 

CtAtomBondData *abd; 

double *weights; 
40 int distance; 

int nextDistance; 

CtAtomBondData *found; 

double aweight; 

double *raw_weights; 
45 int toAtom; 

double adjval; 

static double maxadj = -1.0; 
static double feature_align = 1.0; 
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FeatureType qfeature, strFeature; 

weights = (double *) calloc(sizeof(double), ct->atomCount ); 
raw_weights = (double *) calloc(sizeof(double), ct- > atomCount ); 

5 

for ( i = 0, aptr = ct-> atoms; i < ct-> atomCount; i+ +, aptr+ + ) 

{ 

aweight = 0.0; 

DB_CT_GET_ATOMP_ATOMIC_WEIGHT(aptr, &aweight); 
10 raw_weights[i] = aweight; 

} 

if ( maxadj == -1.0) 
{ 

char *tptr; 

15 tptr = getenv("DBTOP_FE ATUREALIGNMAXADJ " ) ; 

if (tptr) 

{ 

maxadj = atof(tptr); 

fprintf(stderr, "Maximum feature adjustment for alignment: %8.21f. Set from 
20 environment variable: DBTOP_FEATURE_ALIGN_MAXADJ\n\ maxadj ); 
□ } 
-JJ else 

0 maxadj = 50.0; 

25 : ] tptr = getenv("DBTOP_FEATURE_ALIGN_SCALE"); 

4» if ( tptr ) 

1 { 

feature_align = atof(tptr); 

fprintf(stderr, "Feature alignment scaling factor: %8.21f . Set from environment 
3©3 variable: DBTOP_FEATURE_ALIGN_SC ALE\n " , feature_align ); 

% }' 
X else 

featurealign = 0.5; 

3 if ( maxadj < 0.0 ) 

maxadj = 0.0; 


40 } 


if ( feature_align < 0.0 ) 

feature_align = 0.0; 


if ( qJeatureFactor > 0.0 && maxadj > 0.0 && feature jilign > 0.0 ) 
{ 

for ( i = 0; i < ct-> atomCount; i+ + ) 
45 { 

if ( featureMask[ ctMap[i] ] = = FeatureNone ) 

continue; /* no single atom feature at this atom */ 

for ( k = 0, adjval = 0.0, strFeature = featureMask[ ctMap[i] ]; k < 4; k+ + 
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) 

{ 

if ( strFeature & fMasks[k] ) 

adjval += q_featureFactor * featureWeights[k+l] * 

featurealign; 

} 

if ( adjval > maxadj ) 

adjval = maxadj; 
ra\v_weights[i] += adjval; 

} 

} 

for ( i = 0, A = ct->atoms; i < ct->atomCount; i++, A++ ) 
{ 

if ( i = = baseAtom ) 

continue; 
aptr = A; 

distance = atomDist[i]; 
nextDistance = distance - 1; 
toAtom = i; 
while ( distance ) 
{ 

weights[i] 4- = raw_weights[toAtom]; 

for ( found = (CtAtomBondData *) 0, j = 0, abd = aptr- > bond; Ifound && 
j < aptr->bondCount; j + + , abd+-h ) 

{ 

if ( atomDist[ abd->toAtom ] = = nextDistance ) 
found = abd; 

} 

if ( found ) 
{ 

aptr = ct-> atoms + found- >toAtom; 
toAtom = found- > to Atom; 
nextDistance--; 
distance-; 

} 

else 

distance = 0; 

} 

} 

free((char *) raw weights ); 
return weights; 

} 

static int traverseBranch( struct CtConnectionTable *ct, int atomld, int *atomdist, double *molweight, 
int rootToAtom, int *r Joatom, int *r_length, double *r_weight ) 

{ 
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CtAtom *a; 

CtAtomBondData *abd; 
intj; 

5 a = ct-> atoms + atomld; 

if ( atomdist[ atomld ] > *r_length | j 

( atomdist[ atomld ] = = *r Jength && molweight[atomId] > *r_weight ) ) 

{ 

*r_toatom = rootToAtom; 
10 *r Jength = atomdist[atomId]; 

*r_weight = molweighttatomld]; 

} 

fbr(j = 0, abd = a->bond; j < a->bondCount; j+ + , abd++ ) 
{ 

15 if ( atomdistf abd->toAtom ] = = ( atomdist[atomId] + 1 ) ) 

{ 

#ifdef DEBUG_DETAIL 

if ( debug2 ) 

fprintf(debug2,"# -> %dto%d dist: %d %d root:%d\n", 
20 atomld, abd->toAtom, atomdist[abd-> toAtom], 

53 atomdistfatomld], rootToAtom ); 
€- #endif 

;C traverseBranch(ct, abd->toAtom, atomdist, molweight, rootToAtom, rjoatom, 

=y r length, r_weight ); 
25H } 

f } 


3(M /* 

+: return an array containing the toAtom for each atom which points to the 

:.f. largest chain bases on size and then weight. 

1 *l 


35 s " static int *findLargestBranch(struct CtConnectionTable *ct, int *atomdist, double *weights ) 
{ 

int *bi; 
int i j; 
int toAtom; 
40 int length; 

double theWeight; 
CtAtomBondData *abd; 
CtAtom *atom; 

45 

bi = (int *) calloc(sizeof(int), ct- > atomCount ); 
for ( i = 0; i < ct-> atomCount; i++ ) 
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{ 

atom = ct-> atoms + i; 
to Atom = length = -1; 
theWeight = -1.0; 

for (j = 0, abd = atom- > bond; j < aton»bondCount; j++, abd++ ) 
if ( atomdist[ abd->toAtom ] = = ( atomdist[i] + 1 ) ) 
#ifdef DEBUG_DETAIL 

if ( debug2 ) 

fprintf(debug2,"# %d to %d dist:%d %d\n", 

i, abd- > toAtom, atomdist[abd- > toAtom], atomdist[i] ); 

#endif 

traverseBranch(ct, abd- > toAtom, atomdist, weights, abd- > toAtom, 
AtoAtom, &length, &theWeight ); 

} 

} 

bi[i] = toAtom; 

} 

return bi; 

} 


static double CompareTwoCompounds(Split *query, Split *str, double radius, int *r_qidx, int *r_sidx, 
int *r_splitidx, int *r_three, int *r_subsethit, double *r_best2, double *r_best3, double *r_bestsub, 
double *r_att_pen, int bailedout ) 
{ 

double best; 

double best2, best3, bestsub; 

double dl,d2, d3, d4, d5, d6; 

double dval[6]; 

double cdval[6]; 

double attPen[2]; 

int hevCnts[6]; 

int bestQ, bestStr; 

int bestldx; 

int threelsBetter = 0; 

int SublsBetter = 0; 

int idl, id2, id3, id4; 

int i j,k, 1; 

int ids[3]; 

Frag *f, *sf; 

Frag *ql ? *q2, *q3 ? *q4; 

Frag *fsl, *fs2, *fs3, *fs4; 

Frag *fragPtrs[3]; 

Frag *qActive; 

split2 *qs2, *ss2; 

split3 *qs3, *ss3; 
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double *dptr; 
double hexdiff; 
double fieldDiff; 
double outPen; 
double bailout; 
double *cf[6]; 
int max3; 
static Split *qlnit; 


*r_att_pen = 0.0; 
*r_qidx = bestQ = -1; 

if ( query- >numFrags = = 0 | j str- > numFrags == 0 ) 
return 9999.0; 

bailout = radius*radius; 

regid = (char *) 0; 

DB_CT_GET_CT_ATTR(str-> ct,CtCtRegId ? &regid ); 
if ( Iregid ) 

DB_CT_GET_CT_ATTR(str- > ct,CtCtName, &regid ); 

#ifdefUSE_HEX 

if ( qlnit ! = query ) 
{ 

for ( i = 0, f = query- > frags; i < query- > numFrags; i+ + , f++ ) 

if (f->topHex) 

f- > toplnt = hexStringToInts(f- > topHex, &(f- > topIntSize) ); 

qlnit = query; 

} 

#endif 

for ( i = 0, f = query- > frags; i < query- > numFrags; i+ + , f++ ) 

if (f->hexDiff) 

free((char *) f->hexDiff ); 

#ifdef STDREGION 

if (f->stdDiff ) 

free((char *)f->stdDiff); 

#endif 

f- > hexDiff = (double *) calloc(str- > numFrags,sizeof(double) ); 
#ifdef STDREGION 

f- > stdDiff = (double *) calloc(str- > numFrags,sizeof(double) ); 

#endif 

for ( j = 0; j < str- > numFrags; j + + ) 
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#ifdef STD REGION 


#endif 


f->hexDiff[j] = -1.0; 
f->stdDiff[j] = -1.0; 


} 

#ifdefUSE_HEX 

for ( i = 0, f = str->frags; i < str->numFrags; i++, f++ ) 

if (f->topHex) 

f->toplnt = hexStringToInts(f->topHex, &(f->topIntSize) ); 

#endif 


#ifdef CALCBATCHDIFF 

for ( i = 0, f = query- > frags; i < query- >numFrags; i+ + , f+ + ) 

for (j = 0, sf = str- > frags; j < str->numFrags; j + + , sf++ ) 


#ifdef USE_HEX 

sf->topIntSize); 
#else 

#endif 


#if0 


#endif 
#if0 


#endif 


f->hexDiff[j] = fieldlntDiff (f->topInt, sf->top!nt, f->topIntSize, 


f->hexDiff[j] = topFieldDiff(f->topField, sf->topField, str->npoints ); 

if (f->featureDiff) 

f->featureDiff[j] = compareFeatures(query, f, str, sf, -1, -1 ); 

fieldDiff = topFieldDiff(f->topField, sf->topField, str->npoints ); 
fprintf(stderr,"hexvsraw: hex:%7.41f field: %7.41f diff:%7.41f \n", 
f->hexDiff[j], fieldDiff, fieldDiff - f->hexDiffO] ); 


hexdiff = fieldHexDiff(f->topHex, sf->topHex, 0 ); 
hexdiff *= hexdiff; 

if ( fabs( hexdiff - f->hexDiff[j] ) > 0.0001 ) 

fprintf(stderr,"field diff: %8.61f %8.61f %8.51f \n", 
hexdiff, f->hexDifffj], 
hexdiff -f->hexDifflj] ); 


#endif 
#if0 


fprintf(stderr,"s2 cnts:%d %d\n", query- >s2cnt, str->s2cnt ); 
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fflush(stderr); 

#endif 

best = best2 = best3 = bestsub = 9999.0 * 9999.0; 

/* 

2 piece steric field comparison 
*/ 

if ( query- >s2 && str-> s2 && q_do2piece ) 
{ 

for ( i = 0, qs2 = query->s2; i < query->s2cnt; i+ +, qs2++ ) 

if (qs2->fragl == -1 1 1 qs2->frag2 == -1) 
continue; 

ql = query- > frags + qs2->fragl; 
q2 = query- > frags + qs2->frag2; 
if ( q_partialMatch ) 

{ 

ql->featureDiff = ql->feature2PDiff; 
q2->featureDiff = q2- > feature2PDiff; 

} 

for (j = 0, ss2 = str->s2; j < str->s2cnt; j+ + , ss2++ ) 

if (ss2->fragl == -1 1 1 ss2->frag2 == -1) 
continue; 

#ifndef NOSTRMAP 

if ( qs2->strMap && qs2->strMap[j] = = 0 ) 

continue; /* feature throws this one out */ 

#endif 

idl = (str-> frags + ss2-> tragi)- > id; 
id2 = (str-> frags + ss2->frag2)->id; 

fsl = str-> frags + ss2-> tragi; 
fs2 = str-> frags + ss2->frag2; 
t_2compare++; 

#ifO 

fprintf(stderr,"ids %d: %d %d\n", j, idl, id2 ); 
fflush(stderr); 

#endif 

outPen = fsl->outsidePenalty + fs2->outsidePenalty; 
if ( outPen ) 

{ 

if ( outPen > bailout ) 
{ 

continue; 
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} 

#ifdef NO_COMPRESSION 

ql->hexDiff[idl] = topFieldDiff(ql->qtf[fsl->regionIdx], fsl-> topField, 


fsl->npoints ); 

fs2->npoints ); 

fsl->npoints ); 

fs2->npoints ); 
#else 


ql->hexDiff[id2] = topFieldDiff(ql->qtflfs2-> regionldx], fs2-> topField, 
q2->hexDiff[idl] = topFieldDiff(q2->qtfIfsl-> regionldx], fsl->topField, 
q2->hexDiff[id2] = topFieldDiff(q2->qtf[fs2->regionIdx], fs2-> topField, 


if ( q_featureFactor > 0.0 && ql->featureDiff && q2->featureDiff ) 

ql-> hexDiff[idl]=topFieldCompressedDiff(ql- > qtf[fsl- > regionldx], 
fsl- > topField, fsl->npoints, ql->featureDiff[idl] ); 

ql-> hexDiff[id2]=topFie!dCompressedDiff(ql- > qtf[fs2- > regionldx] 
fs2->topField, fs2->npoints, ql->featureDiff[id2] ); 

q2- > hexDiff[id l]=topFieldCompressedDiff(q2- > qtf[fs 1- > regionldx], 
fsl- > topField, fsl->npoints, q2->featureDiff[idl] ); 

q2- > hexDiffIid2]=topFieldCompressedDiff(q2- > qtf[fs2- > regionldx], 
fs2->topField, fs2->npoints, q2->featureDiffIid2] ); 

} 

else 

{ 

ql- > hexDiff[idl]=topFieIdCompressedDiff(ql- > qtfTfsl- > regionldxl 
fs l-> topField, fsl-> npoints, 0.0); 

ql->hexDiff[id2]=topFieldCompressedDiff(ql->qtf[fs2->regionIdxl 
fs2- > topField, fs2-> npoints, 0.0); 

q2->hexDiffIidl]=topFieldCompressedDifr*(q2->qtf[fsl->regionIdx], 
fsl- > topField, fsl- > npoints, 0.0 ); 

q2- > hexDiff[id2]=topFieldCompressedDiff(q2- > qtf[fs2- > regionldxl 
fs2-> topField, fs2->npoints, 0.0 ); 

} 

#endif 

#ifdef NO_COMPRESSION 
#ifdef COMPRESS_COMPARE 

cf[0] = compressField(ql->qtf[fsl-> regionldx], fsl-> npoints ); 
cf[4] = cornpressField(fsl-> topField, fsl- > npoints ); 
cdval[0] = topFieldCompressedDiff( cf[0], cfT4], fsl- > npoints ); 
fprintf(stderr," Compressed varies by %7.21f %6.21f %6.21f \n", 

fabs(ql->hexDiff[idl] - cdval[0]), ql->hexDiff[idl], cdval[0] ); 
i 

cf[l] = compressField(ql->qtf[fs2-> regionldx], fs2-> npoints ); 

cf[5] = compressField(fs2-> topField, fs2- > npoints ); 

cdval[l] = topFieldCompressedDiff( cf[l], cf[5], fs2-> npoints ); 
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fprintf(stderr, "Compressed varies by %7.21f %6.21f %6.21f \n", 

fabs(ql->hexDiff[id2] - cdval[l]), ql->hexDiff[id2], cdval[l] ); 

free((char *) cf[4] ); 

cf[2] = compressField(q2->qtf[fsl->regionIdx], fsl->npoints ); 
cf[4] = compressField(fsl->topField, fsl->npoints ); 
cdval[2] = topFieldCompressedDiff( cf[2], cf[4], fsl->npoints ); 
fprintf(stderr, "Compressed varies by %7.21f %6.21f %6.21f \n", 

fabs(q2->hexDiff[idl] - cdval[2]), q2->hexDiff[idl], cdval[2] ); 

free((char *) cf[5] ); 

cf[3] = compressField(q2->qtf[fs2->regionIdx], fs2->npoints ); 

cf[5] = compressField(fs2->topField, fs2->npoints ); 

cdval[3] = topFieldCompressedDiff( cf[3], cf[5], fs2->npoints ); 

fprintf(stderr, "Compressed varies by %7.21f %6.21f %6.21f \n\ 

fabs(q2->hexDiff[id2] - cdval[3]), q2->hexDiff[id2], cdval[3] ); 


free((char *) cf[0] ); 
20 free((char '*) cf[l] ); 

4 free((char *) cf[2] ); 
£ free((char *) cf[3] ); 
W. free((char *) cf[5] ); 
^ free((char *) cf[4] ); 

25j #endif 

f #endif . 
00 #ifdef STD_REGION 

3 ¥ ql->stdDiff[idl] = topFieldDiff(ql->stdField, fsl- > stdField, 

2: stdRegion- > n_points ); 

5 ql->stdDiff[id2] = topFieldDiff(ql- > stdField, fs2- > stdField, 
stdRegion->n_points ); 

J q2->stdDiff[idl] = topFieldDiff(q2-> stdField, fsl- > stdField, 

35 stdRegion- > n_points ); 

q2->stdDiff[id2] = topFieldDiff(q2-> stdField, fs2-> stdField, 

stdRegion- >n_points ); 

if ( q_debugfp && ( 

40 ( ql->hexDiff[idl] - ql->stdDiff[idl] ) > 9.0 j | 

( ql->hexDiff[id2] - ql->stdDiff[id2] ) > 9.0 j j 
( q2->hexDiff{idl] - q2->stdDiff[idl] ) > 9.0 | j 
^ ( q2->hexDiff[id2] - q2- > stdDiff[id2] ) > 9.0 ) ) 

45 fprintf(q_debugfp, "region diffs: %d.%d %6.21f %6.21f %6.21f %6.21f 


(idx: %d %d) \n", 


i+lj + l, 

ql->hexDiff[idl] - ql->stdDiff[idl], 
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ql->hexDiff[id2] - ql->stdDiff[id2], 
q2->hexDiff[idl] - q2->stdDiff[idl], 
q2->hexDiff[id2] - q2->stdDiff[id2], 

fsl->regionIdx, fs2->regionIdx ); 


#endif 


if ( q_featureFactor > 0.0 ) 
{ 

dl = ql->hexDiff[idl] + q2->hexDiff[id2] + ql->featureDiff[idl] 
+ q2->featureDiff[id2] + outPen; 

d2 = ql->hexDiff[id2] + q2->hexDiff[idl] + ql->featureDiff[id2] 
+ q2->featureDiff[idl] + outPen; 

} 

else 
{ 

dl = ql->hexDiff[idl] + q2->hexDiff[id2] + outPen; 
d2 = ql->hexDiff[id2] + q2->hexDiff[idl] + outPen; 

if ( dl < best ) 
{ 

bestQ = i; 
bestStr = j; 
best = best2 = dl; 
bestldx = 0; 

} 

if ( d2 < ibest ) 
{ 

bestQ = i; 
bestStr = j; 
best = best2 = d2; 
bestldx = 1; 


#if0 


} 
} 

fprintf(stderr,"s3 cnts:%d %d\n", query- >s3cnt, str->s3cnt ); 
fflush(stderr); 


#endif 

/* 

3 piece steric field comparison 
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*/ 


for ( i = 0, qs3 = query- >s3; q_do3piece && qs3 && i < query- > s3cnt; i+ + , qs3++ ) 

if ( qs3->fragl == -1 j j qs3->frag2 == -1 | j qs3->frag3 == -1 ) 
continue; 

ql = query- > frags + qs3->fragl; 
q2 = query- > frags + qs3->frag2; 
q3 = query- > frags + qs3->frag3; 
q4 = query- > frags + qs3->frag4; 
if ( q_partialMatch ) 
{ 

ql->featureDiff = ql->feature3PDiff; 
q2->featureDiff = q2->feature3PDiff; 
q3->featureDiff = q3- > feature3PDiff; 
q4->featureDiff = q4->feature3PDiff; 

} 

for (j = 0, ss3 = str->s3; ss3 && j < str->s3cnt; j + + , ss3 + + ) 

if (ss3->fragl == -1 j j ss3->frag2 == -1 1 1 ss3->frag3 = = -1 ) 
continue; 

#ifndef NO_STRMAP 

if ( qs3-> strMap && qs3- > strMap[j] = = 0 ) 

continue; /* can't hit this 3 piece combination because 

features throws it out */ 
#endif 

fsl = str-> frags + ss3->fragl; 
fs2 = str-> frags + ss3->frag2; 
fs3 = str-> frags + ss3->frag3; 
fs4 = str-> frags + ss3->frag4; 
idl = fsl- > id; 
id2 = fs2->id; 
id3 = fs3->id; 
id4 = fs4->id; 

t_3compare++; 

#ifdef NO_COMPRESSION 
#ifdefUSE_HEX 

if ( ql->hexDiff[idl] == -1.0 ) 

ql->hexDiff[idl] = fieldIntDiff(ql->topInt, fsl->topInt, 
ql->topIntSize, fsl->topIntSize); 

if ( ql->hexDiff[id4] = = -1.0 ) 

ql->hexDiff[id4] = fieldIntDiff(ql->topInt, fs4->topInt, 
ql->topIntSize, fs4->topIntSize); 

if ( q4->hexDiff[idl] = = -1.0 ) 
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q4->hexDiff[idl] = fieldIntDiff(q4- > toplnt, fsl->topInt, 
q4->topIntSize, fsl->topIntSize); 

if (q4->hexDiff[id4] == -1.0) 

q4->hexDiff[id4] = fieldIntDiff(q4- > toplnt, fs4-> toplnt, 
q4->topIntSize, fc4->topIntSize); 

if (q2->hexDiff[id2] == -1.0) 

q2->hexDiff[id2] = fieldIntDiff(q2-> toplnt, fs2-> toplnt, 
q2->topIntSize, fs2->topIntSize); 

if ( q2- > hexDiff[id3] ==-1.0) 

q2->hexDiff[id3] = fieldIntDiff(q2-> toplnt, fs3-> toplnt, 
q2->topIntSize, fs3->topIntSize); 

if ( q3->hexDiff[id3] == -1 ) 

q3->hexDiff[id3] = fieldIntDiff(q3-> toplnt, fs3-> toplnt, 
q3->topIntSize, fs3->topIntSize); 

if ( q3->hexDiff[id2] = = -1 ) 

q3->hexDiff[id2] = fieldIntDiff(q3-> toplnt, fs2-> toplnt, 
q3->topIntSize, fs2->topIntSize); 
#else 

if ( ql->hexDiff[idl] == -1.0) 

ql->hexDiff[idl] = topFieldDiff(ql->qtf[fsl->regionIdx] , 

fsl->topField, fsl->npoints ); 

if ( ql->hexDiff[id4] = = -1.0 ) 

ql->hexDiff[id4] = topFieldDiff(ql->qtf[fs4->regionIdx], 

fs4- > topField, fe4- > npoints ); 

if ( q4->hexDiff[idl] = = -1.0 ) 

q4->hexDiff[idl] = topFieldDiff(q4->qtf[fsl->regionIdx], 

fsl->topField, fsl-> npoints ); 

if (q4->hexDiff[id4] == -1.0) 

q4- > hexDiff [id4] = topFieldDiff (q4- > qtf [fs4- > regionldx] , 

fs4-> topField, fs4-> npoints ); 

if (q2->hexDiff[id2] == -1.0) 

q2->hexDiff[id2] = topFieldDiff(q2->qtf[fs2-> regionldx], 

fc2-> topField, fc2-> npoints ); 

if ( q2->hexDiff[id3] = = -1.0 ) 

q2->hexDiff[id3] = topFieldDiff(q2->qtf[fs3-> regionldx], 

fs3- > topField, fs3-> npoints ); 

if ( q3->hexDiff[id3] = = -1 ) 
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q3- > hexDiff[id3] = topFieldDiff(q3- > qtf[fs3- > regionldxl 

fs3- > topField, fs3-> npoints); 

if ( q3->hexDiff[id2] == -1 ) 

q3->hexDiff[id2] = topFieldDiff(q3-> qtf[fs2->regionIdx], 

fs2->topField, fs2->npoints ); 

outPen = ( (fsl->outsidePenalty + fs2->outsidePenalty) / 2.0 ) + 
fs2->outsidePenalty + fs3->outsidePenalty; 
#endif 
#endif 

#iftidef NO_COMPRESSION 

if ( ql->hexDiff[idl] == -1.0) 

ql- > hexDiffpdl] = topFieldCompressedDiff(ql- > qtf[fsl- > regionldx] 
, fsl->topField, fsl->npoints, 0.0 ); 

if ( ql->hexDiff[id4] = = -1.0 ) 

ql- > hexDiff[id4]=topFieldCompressedDiff(ql- > qtf[fs4- > regionldxl 
fs4->topField,fs4-> npoints, 0.0); 

if ( q4->hexDiff[idl] = = -1.0 ) 

q4- > hexDifffid 1] =topFieldCompressedDiff(q4- > qtfffs 1- > regionldxl 
ft l-> topField, ft l-> npoints, 0.0 ); 

if ( q4->hexDiff[id4] == -1.0 ) 

q4- > hexDif¥[id4] =topFieldCompressedDiff(q4- > qtf[fs4- > regionldxl 
fs4-> topField, fc4-> npoints, 0.0 ); 

if ( q2->hexDiff[id2] = = -1.0 ) 

q2- > hexDiff[id2]=topFieldCompressedDiff(q2- > qtf[fs2- > regionldxl , 

fs2- > topField, fs2-> npoints, 

q2->featureDiff ? q2->featureDiff[id2] : 0.0 ); 

if ( q2->hexDiff[id3] = = -1.0 ) 

q2- > hexDiff[id3]=topFieldCompressedDiff(q2- > qtfTfs3- > regionldxl 

fs3-> topField, fs3-> npoints, 

q2->featureDiff ? q2- > featureDiff[id3] : 0.0 ); 

if (q3->hexDiff[id3] == -1 ) 

<£- > hexDiff[id3] =topFieldCompressedDiff(q3- > qtf[fs3- > regionldx] , 

fs3- > topField, fs3- > npoints, 

q3->featureDiff ? q3->featureDiff[id3] : 0.0 ); 

if (q3->hexDiff[id2] == -1 ) 

<£- > hexDifT[id2] =topFieldCompressedDiff(q3- > qtflfs2- > regionldxl 

fs2- > topField, fs2- > npoints, 

q3- > featureDiff ? q3- > featureDiff[id2] : 0.0 ); 
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outPen = ( (fsl->outsidePenalty + fs2->outsidePenalty) / 2.0 ) + 
fs2->outsidePenalty + fs3->outsidePenalty; 
#endif 


#ifdef STD_REGION_3P 




stdRegion->n_points ); 

ql->stdDiff[idl] = 

= topFieldDiff(ql - > stdField, 

fsl-> stdField, 

stdRegion- > n_points ); 

q4->stdDiff[idl] = 

= topFieldDiff(q4-> stdField, 

fsl-> stdField, 

stdRegion->n_points ); 

ql->stdDiff[id4] = 

= topFieldDiff(ql- > stdField, 

fs4-> stdField, 

stdRegion- >n_points ); 

q4->stdDiff[id4] = 

= topFieldDiff(q4-> stdField, 

fs4-> stdField, 

stdRegion- >n_points ); 

q2->stdDiff[id2] = 

= topFieldDiff(q2- > stdField, 

fs2-> stdField, 

stdRegion- >n_points ); 

q2->stdDiff[id3] = 

= topFieldDiff(q2-> stdField, 

fs3-> stdField, 

stdRegion- >n_points ); 

q3->stdDiff[id3] = 

= topFieldDiff(q3-> stdField, 

fs3-> stdField, 

stdRegion- >n_points ); 

q3->stdDiff[id2] = 

= topFieldDiff(q3- > stdField, 

fs2-> stdField, 


fprintf(stderr, "# region diffs %6.21f %6.21f %6.21f %6.21f %6.21f %6.21f %6 21f 
%6.21f (idx: %d %d %d %d) out:%6.21f\n", 

ql->hexDiff[idl] - ql->stdDiff[idl], 
ql->hexDif¥[id4] - ql->stdDif¥[id4], 
q4->hexDiff[idl] - q4->stdDiff[idl], 
q4->hexDiff[id4] - q4->stdDiff[id4], 
q2->hexDif¥[id2] - q2->stdDiff[id2], 
q2->hexDii¥[id3] - q2->stdDiff[id3], 
q3- > hexDiflf[id3] - q3- > stdDiff[id3] , 
q3->hexDiff[id2] - q3->stdDiff[id2], 
fsl->regionIdx, fs4->regionIdx, fs2-> regionldx, 

fs3->regionIdx, outPen ); 
#endif 


+ q3->hexDiff[id3]; 
+ q3->hexDif¥Iid2]; 


attPen[0] - attPen[l] = 0.0; 

dval[0] = (ql->hexDiff[idl] + q4->hexDiff[id4] ) / 2.0 + q2->hexDiff[id2] 
dval[l] = (ql->hexDif¥[id4] + q4->hexDiff[idl])/2.0 + q2->hexDif¥[id3] 

if ( outPen > 0.0 ) 
{ 

dval[0] + = outPen; 
dval[l] + = outPen; 

} 

if ( q_attachPenFactor > 0.0 ) 
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{ 

attPen[0] = ( computeAttachmentPenalty( ql, fsl, q4, fs4 ) + 
computeAttachmentPenalty(q4, fc4, ql, fsl) ); 

attPen[l] = ( computeAttachmentPenalty( ql, fs4, q4, fsl ) + 
computeAttachmentPenalty(q4, fsl, ql, fs4) ); 

dval[0] += attPen[0]; 
dval[l] += attPen[l]; 

} 

if ( q_featureFactor > 0.0 ) 
{ 

dval[0] += ( ql->featureDiff[idl] + q4->featureDiff[id4] ) / 2.0 + 
q2->featureDiff[id2] + q3->featureDiff[id3]; 

dval[l] += ( ql->featureDiff[id4] + q4->featureDiff[idl] ) / 2.0 + 
q2->featureDiff[id3] + q3->featureDiff[id2]; 

} 

max3 = 2; 

if ( dval[0] < 0.0 ) 
{ 

#if0 

if ( q_debugfp ) 

fprintf(q_debugfp, "3 below zero #0 %8.41f %8.41f %8.41f 

%8.41f (%d %d %d %d) \n", 

ql->featureDiff[idl] , q4- > featureDiff[id4] 

q2->featureDiffTid2] , q3->featureDiffTid3], 

idl, id4, id2, id3 ); 

#endif 

dval[0] = 0.0; 

} ; 

if ( dval[l] < 0.0 ) 
{ 

#ifO 

if ( q_debugfp ) 

fprintf(q_debugfp, "3 below zero #1 %8.41f %8.41f %8.41f 

%8.41f (%d %d %d %d)\n", 

ql->featureDiff[id4] , q4->featureDiff[idl] , 

q2->featureDiff[id3] , q3->featureDiff[id2], 

id4, idl, id3, id2 ); 

#endif 

dval[l] = 0.0; 

} 


for (k = 0; k < max3; k++ ) 
{ 

if ( dval[k] < best ) 
{ 
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best = best3 = dval[k]; 
bestQ = i; 
bestStr = j; 
bestldx = k; 
threelsBetter = 1; 
^ *r_att_pen = attPen[k] > 0.0 ? sqrt(attPen[k]) : 0.0; 

else if ( dvalfk] < best3 ) 
best3 = dval[k]; 


/* 

subset steric field comparison 
*/ 


if ( query- >s2 && str->s3 && q doSubset ) 
{ 

/* loop over query 2 piece fragments, and compare with the structure's 

3 piece fragments. */ 

for ( i = 0, qs2 = query->s2; i < query->s2cnt ; i+ +, qs2 + + ) 

if (qs2-> tragi = = -1 j j qs2->frag2 == -1) 
continue; 

ql = query- > frags + qs2->fragl; 
q2 = query- > frags + qs2->frag2; 
if ( q_partialMatch ) 
{ 

ql->featureDiff = ql->featureSubsetDiff; 
^ q2- > featureDiff = q2- > featureSubsetDiff; 

for (j = 0, ss3 = str->s3; ss3 && j < str- > s3cnt; j + + , ss3++ ) 

if (ss3->fragl == -1 1 1 ss3->frag2 == -1 j | ss3->frag3 == -1 ) 
continue; 

if ( qs2->subsetMap && qs2->subsetMap[j] = = 0 ) 

continue; /* feature throws this one out */ 

fsl = str- > frags + ss3->fragl; 
fs2 = str- > frags + ss3->frag2; 
fs3 = str- > frags + ss3->frag3; 
fs4 = str- > frags + ss3->frag4; 
idl = fsl- > id; 
id2 = fs2->id; 
id3 = fs3->id; 
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id4 = fs4->id; 

#if 1 

if (ql->hexDiff[idl] == -1.0) 

ql- > hexDiffpdl]=topFieldCompressedDiff(ql- > qtf[fsl- > regionldx], 
fsl->topField, fsl->npoints, 0.0 ); 

if ( ql->hexDiff[id2] = = -1.0 ) 

ql- > hexDiff[id2]=topFieldCompressedDiff(ql- > qtf[fs2- > regionldx], 
fs2->topField, fs2->npoints, 0.0 ); 

if ( q2->hexDiff[idl] == -1.0) 

q2- > hexDiff[idl]=topFieldCompressedDiff(q2- > qtf[fsl- > regionldx], 
fsl->topField, fsl->npoints, 0.0 ); 

if ( q2->hexDiff[id2] = = -1.0 ) 

q2- > hexDiff[id2]=topFieldCompressedDiff(q2- > qtf[fs2- > regionldx], 
fs2->topField, fs2->npoints, 0.0 ); 

if ( ql->hexDiff[id3] == -1.0) 

ql->hexDiff[id3]=topFieldCompressedDiff(ql->qtf[fs3->regionIdx], 
fs3->topField, fs3->npoints, 0.0 ); 

if ( ql->hexDiff[id4] = = -1.0 ) 

ql-> hexDiff[id4]=topFieldCompressedDiff(ql- > qtf[fs4- > regionldx], 
rs4->topField, fs4->npoints, 0.0 ); 

if (q2->hexDiff]id3] == -1.0) 

q2- > hexDiff[id3] =topFieldCompressedDiff (q2- > qtf[fs3- > regionldx] , 
fs3->topField, fs3->npoints, 0.0 ); 

if ( q2->hexDiff[id4] = = -1.0 ) 

q2- > hexDiff[id4]=topFieldCompressedDiff(q2- > qtf[fs4- > regionldx] , 
fs4->topField, fs4->npoints, 0.0 ); 
#else 


ql->hexDiff[idl] 
fsl->topField, fsl->npoints, 0.0 ); 

ql->hexDiff[id2] 
fs2->topField, fs2->npoints, 0.0 ); 

q2->hexDiff[idl] 
fsl->topField, fsl->npoints, 0.0 ); 

q2->hexDiff[id2] 
fs2->topField, fs2->npoints, 0.0 ); 

ql->hexDiff[id3] 
fs3->topField, fs3->npoints, 0.0 ); 

ql->hexDiff[id4] 
fs4->topField, fs4->npoints, 0.0 ); 

q2->hexDiff[id3] 
fs3->topField, fs3->npoints, 0.0 ); 

q2->hexDiff[id4] 
fs4->topField, fs4->npoints, 0.0 ); 


topFieldCompressedDiff(ql-> qtf[fsl-> regionldx], 
topFieldCompressedDiff(ql- > qtf[fs2- > regionldx], 
topFieldCompressedDiff(q2- > qtf[fs 1 - > regionldx] , 
topFieldCompressedDiff(q2-> qtf[fs2-> regionldx], 
topFieldCompressedDiff(ql - > qtf[fs3- > regionldx], 
topFieldCompressedDiff(ql- > qtf[fs4- > regionldx] , 
topFieldCompressedDiff(q2- > qtf[fs3- > regionldx], 
topFieldCompressedDiff(q2- > qtf[fs4- > regionldx], 
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#endif 


if ( q_featureFactor > 0.0 ) 
5 { 

dval[0] = ql->featureDiff[idl] + q2->featureDiff[id2]; 
dvalfl] = ql->featureDiff[id2] + q2->featureDiff[idl]; 
dval[2] = ql->featureDiff[id3] + q2->featureDiff[id4]; 
^ ^ dval[3] = ql->featureDiff[id4] + q2->featureDiff[id3]; 

else 

dval[0] = dval[l] = dval[2] = dval[3] = 0.0; 

dval[0] += ql->hexDiff[idl] + q2->hexDiff[id2] 

15 dval[l] += ql->hexDiff[id2] + q2->hexDiff[idl] 

dval[2] += ql->hexDiff[id3] + q2->hexDiff[id4] 

dval[3] += ql->hexDiff[id4] + q2->hexDiff[id3] 


fprintf(stderr,"%d %d with %d %d Feature; %8.21f %8.21f Steric: %8.21f 


#ifO 

2Q S , %8.21f\n", 

- ql-> id,q2-> id, idl, id2, ql->featureDiff[idl], 

«~ q2- > featureDiff[id2], ql- > hexDiff[idl], q2- > hexDiff[id2] ); 

1= fprintf(stderr,"%d %d with %d %d Feauture: %8.21f %8.21f Steric: %8.21f 

25^ %8.21f\n", 

J ql->id,q2->id, id2, idl, ql- > featureDiff[id2], 

gg q2->featureDiff[idl], ql->hexDiff[id2], q2->hexDiff[idl]); 

fi fprintf(stderr,"%d %d with %d %d Feature: %8.21f %8.21f Steric: %8.21f 

30p %8.21f\n", 

O ql->id,q2->id, id3, id4, ql->featureDiff[id3], 

fyi q2-> featureDiff[id4], ql->hexDiffIid3], q2-> hexDiff[id4] ); 

C fprintf(stderr,"%d %d with %d %d Feature: %8.21f %8.21f Steric: %8.21f 

35 %8.21f\n", 

ql->id,q2->id, id4, id3, ql- > featureDiff[id4], 
q2->featureDiff[id3], ql->hexDiff[id4], q2->hexDiff[id3] ); 
#endif 

40 

hevCnts[0] = hevCnts[l] = fsl->hevCnt + fs2->hevCnt; 
hevCnts[2] = hevCnts[3] = fs3->hevCnt + fs4->hevCnt; 

#ifO 

fprintf(stderr,"dvals: %8.21f %8.21f %8.21f %8.21f \n", dval[0], dvalfl], 

45 dval[2], dval[3] ); 

fprintffstderr/hevCnts: %d %d min:%d\n", hevCnts[0], hevCntsfl], 

q_minSubsetSize ); 
#endif 
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max3 = 4; 

for ( k = 0; k < max3; k+ + ) 
{ 

5 if ( hevCnts[k] > = q_minSubsetSize ) 

{ 

if ( dval[k] < best ) 
{ 

best = bestsub = dval[kj; 
!0 bestQ = i; 

bestS tr = j; 
bestldx = k; 
SublsBetter = 1; 

} 

15 else if ( dval[k] < bestsub ) 

bestsub = dvalfk]; 

t if ( dvalfk] < q_bailout && qs2->subsetMap[j] = = 0 ) 

2 % qs2->subsetMap[j] = 1; 

S } 

M } 

m } 

£ } 

25= } 
C } /* end of subset */ 

m #ifdef DEBUG_DETAIL 
l~ if ( debug2 ) 
30j { 

g /* dump array of difference matrix values */ 
m if ( regid ) 

p fprintf(debug2, " %s\n" , regid ); 

u, for ( i = 0; i < query- >numFrags; i++ ) 

35 { 

fsl = query- > frags + i; 

dptr = fsl->hexDiff; 

for (j = 0; j < str->numFrags; j + + ) 

{ 

40 fprintf(debug2,"%7.21f *(dptr+j) ); 

fprintf(debug2,"\n"); 

} 

fprintf(debug2 > "\n ,, ); 
45 for ( i = 0; i < query- > numFrags; i+ + ) 

{ 

fsl = query- > frags + i; 

for (j = 0; j < str-> numFrags; j + + ) 
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{ 

fs2 = str-> frags + j; 

fprintf(debug2,"%3d,%3d ", fsl->atomCnt, fs2->atomCnt ); 
fiprintf(debug2,"\n"); 

} 

fiprintf(debug2,"\n"); 
fprintf(debug2, "Query split 2\n"); 
for ( i = 0; i < query-> s2cnt; i+ + ) 
{ 

qs2 = query- >s2 + i; 

fprintf(debug2,"%d %d\n", qs2-> tragi, qs2->frag2 ); 

fprintf(debug2,"\nStr split 2\n"); 
for ( i = 0; i < str->s2cnt; i+ + ) 
{ 

qs2 = str->s2 + i; 

fprintf(debug2,"%d %d\n", qs2->fragl, qs2->frag2 ); 

fprintf(debug2,"\nQuery split 3\n"); 
for ( i = 0; i < query->s3cnt; i++ ) 

qs3 = query- >s3 + i; 

fprintf(debug2,"%d %d %d\n", qs3->fragl, qs3->frag2, qs3->frag3 ); 

fprintf(debug2,"\nStr split 3\n M ); 
for ( i = 0; i < str->s3cnt; i++ ) 
{ 

qs3 = str->s3 + i; 

^ fprintf(debug2,"%d %d %d\n n , qs3->fragl, qs3->frag2, qs3->frag3 ); 
fprintf(debug2," \n") ; 

#endif 


#if 0 

fprintf(stderr,"done with this one\n"); 
fflush(stderr); 
#endif ; 
if ( q_debugfp ) ' 

fprintf(q_debugfp, "q %d str: %d idx %d 3is %dsubis %dbest2 %8.41f best3 %8.41f 
bestsub %8.41f \n", 

bestQ, bestStr, bestldx, threelsBetter, SublsBetter, best2, best3, bestsub ); 
*r_qidx = bestQ; 
*r_sidx = bestStr; 
*r_splitidx = bestldx; 
*r_three = threelsBetter; 
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*r_subsethit = SublsBetter; 
if ( best2 < 0.0 ) 

best2 = 0.0; 
*r_best2 = sqrt(best2); 
if ( best3 < 0.0 ) 

best3 = 0.0; 
*r_best3 = sqrt(best3); 
*r_bestsub = sqrt(bestsub); 
if ( best < 0.0 ) 

best = 0.0; 
return sqrt(best); 

} 

static int get_details( topresult *res, Split *query, Split *str, 

int bestq, int bestStr, int bestldx, int threeMatched, int subsetHit, int keepCts ) 

split2 *qs2, *s2; 
split3 *qs3, *s3; 
int ids[3]; 
Frag *f; 
Frag *sf; 

if ( subsetHit ) 
{ 

threeMatched - 0; 

if ( bestq < 0 | j bestq > = query- >s2cnt ) 
return -1; 

if ( bestStr < 0 J j bestStr > = str->s3cnt ) 
return -1; 

qs2 = query- >s2 + bestq; 
s3 = str->s3 + bestStr; 
switch ( bestldx ) 

{ 

#if0 

dval[0] += ql->hexDiff[idl] + q2->hexDiff[id2]; 
dval[l] += ql->hexDiff[id2] + q2->hexDiff[idl]; 
dval[2] += ql->hexDiff[id3] + q2->hexDiff[id4]; 
dval[3] += ql->hexDiff[id4] + q2->hexDiff[id3]; 

#endif 

case 0: 

ids[0] = s3->fragl; 
idsfl] = s3->frag2; 
break; 

case 1: 

ids[0] = s3->frag2; 
ids[l] = s3->fragl; 
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break; 

case 2: 

ids[0] = s3->frag3; 
ids[l] = s3->frag4; 
break; 

case 3: 

ids[0] = s3->frag4; 
ids[l] = s3->frag3; 
break; 

default: 

return -1; 

} 

f = query- > frags + qs2->fragl; 
sf = str-> frags + ids[0]; 
res->qids[0] = f->id; 
res->outside[0] = sf-> outside; 
if (f->ct&& sf->ct) 
{ 

res->qFrags[0] = f->ct; 

res->hexDiffs[0] = sqrt( f->hexDiff [ ids[0] ] ); 

if ( q_partialMatch ) 

f->featureDiff = f->featureSubsetDiff; 
if (f->featureDiff ) 

res->featureDiffs[0] = sqrt( f->featureDiff [ ids[0] ] ); 

else 

res->featureDiffs[0] = 0.0; 

} 

else 
{ 

res->hexDiffs[0] = 1.0; 
res->featureDiffs[0] = 1.0; 

} 

if (sf->ct&& keepCts ) 

res->strFrags[0]= makeFragCopy(sf->ct, ids[0], -1 ); 
f = query- > frags + qs2->frag2; 
sf = str-> frags + ids[l]; 
res->qids[l] = f T >id; 
res->outside[l] = sf-> outside; 
if (f->ct&&sf->ct) 
{ 

res->qFrags[l] = f->ct; 

res->hexDiffs[l] = sqrt( f->hexDiff [ ids[l] ] ); 

if ( q_partialMatch ) 

f- > featureDiff = f- > featureSubsetDiff; 
if (f->featureDiff) 
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res->featureDiffs[l] = sqrt( f- > featureDiff [ ids[l] ] ); 

else 

res->featureDiffs[l] = 0.0; 

} { 
else 

{ 

res->hexDiffs[l] = 1.0; 
res->featureDiffs[l] = 1.0; 

} 

if (sf->ct&&keepCts) 

res->strFrags[l]= makeFragCopy(sf->ct, ids[l], -1 ); 
res->qids[2] = -1; 
res->strids[0] = ids[0]; 
res->strids[l] = ids[lj; 
res->strids[2] = -1; 

} 

else if ( threeMatched ) 
{ 

qs3 = query- >s3 + bestq; 
s3 = str->s3 + bestStr; 

switch(bestldx) 

{ 

case 0: 
case 2: [ 

ids[0] = s3->fragl; 

ids[l] = s3->frag2; 

ids[2] = s3->frag3; 

break; 

case 1: 
case 3: 

ids[0] = s3->frag4; 
ids[l] = s3->frag3; 
ids[2] = s3->frag2; 
break; 

„ i 

case 2: 5 

ids[0] = s3->frag2; 
ids[l] = s3->fragl; 
ids[2] = s3->frag3; 
break; 

case 3: 

ids[0] = s3->frag2; 
ids[l] = s3->frag3; 
ids[2] = s3-> tragi; 
break; 

case 4: 
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ids[0] = s3->frag3; 
ids[l] = s3->frag2; 
ids[2] = s3->fragl; 
break; 

case 5: 

ids[0] = s3->frag3; 
ids[l] = s3->fragl; 
ids[2] = s3->frag2; 
break; 

#endif 


default: 

return -1; 

} 

res->hexDiffs[0] = res->hexDiffs[l] = res->hexDiffs[2] = 0.0; 

f = query- > frags + qs3-> tragi; /* always use the first query fragment for the center 

piece, 

report the corresponding best hit (avg anyway) fragment from the 
structure fragment */ ; 

res->qids[0] = f->id; 

if (f->ct) 

{ 

res->qFrags[0] = f->ct; 

res->hexDiffs[0] = sqrt( f->hexDiff [ ids[0] ] ); 

if ( q_partialMatch ) 

f->featureDiff = f->feature3PDiff; 
if (f->featureDiff ) 

res->featureDiffs[0] = sqrt( f->featureDiff [ ids[0] ] ); 
else ; 

res->featureDiffs[0] = 0.0; 

} 

f = str-> frags + ids[0]; 
if (f->ct&&keepCts) 

res->strFrags[0]= makeFragCopy(f- > ct, ids[0], -1 ); 
res->outside[0] = f-> outside; 

f = query- > frags + qs3->frag2; 
res->qids[l] = f- > id; 
if (f->ct) 

{ 

res->qFrags[l] = f->ct; 

res->hexbiffs[l] = sqrt( f->hexDiff [ ids[l] ] ); 

if ( q_partialMatch ) 

f->featureDiff = f->feature3PDiff; 
if (f->featureDiff ) 

res->featureDiffs[l]= sqrt( f->featureDiff [ ids[l] ] ); 
else < 
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res->featureDiffs[0] = 0.0; 

} 

f = str-> frags + ids[l]; 
if (f->ct&& keepCts) 

res->strFrags[l] = makeFragCopy(f->ct, ids[l], -1 ); 
res->outside[l] = f->outside; 

f = query- > frags + qs3->frag3; 
res->qids[2] = f->id; 
if (f->ct) 
{ 

res->qFrags[2] = f->ct; 

res->hexDiffs[2]= sqrt( f->hexDiff [ ids[2] ] ); 

if ( q_partialMatch ) 

f->featureDiff = f->feature3PDiff; 
if (f->featureDiff ) 

res->featureDiffc[2] = sqrt( f->featureDiff [ ids[2] ] ); 

else 

res->featureDiffs[2] = 0.0; 

} 

f = str-> frags +! ids[2]; 
if (f->ct&&keepCts) 

res->strFrags[2] = makeFragCopy(f->ct, ids[2], -1 ); 
res->outside[2] = f-> outside; 

res->strids[0] = ids[0]; 
res->strids[l] = ids[l]; 
res->strids[2] = ids[2]; 

/* A 2 piece hit *'/ 

qs2 = query- >s2 + bestq; 
s2 = str->s2 + bestStr; 

if ( bestldx = = (j ) 
{ 

ids[0] = s2->fragl; 
ids[l] = s2->frag2; 

} 

else 

{ 

ids[0] = s2->frag2; 
ids[l] = s2->fragl; 

} 

f = query- > frags + qs2->fragl; 
sf = str-> frags + ids[0]; 
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res->qids[0] = f- > id; 
res->outside[0] = sf-> outside; 
if (f->ct&&sf->ct) 
{ 

res->qFrags[0] = f->ct; 

res->hexDiffs[0] = sqrt( f->hexDiff [ ids[0] ] ); 

if ( q_partialMatch ) 

f->featureDiff = f->feature2PDiff; 
if (f->featureDiff) 

res->featureDiffs[0] = sqrt( f->featureDiff [ ids[0] ] ); 

else 

res->featureDiffs[0] = 0.0; 

} 

if (sf->ct&& keepCts ) 

res->strFrags[0]= makeFragCopy(sf->ct, ids[0], -1 ); 
f = query- > frags + qs2->frag2; 
sf = str-> frags + ids[l]; 
res->qids[l] = f->id; 
res->outside[l] = sf-> outside; 
if (f->ct&&sf->ct) 
{ 

res->qFrags[l] = f->ct; 

res->hexDiffs[l] = sqrt(f->hexDiff [ ids[l] ] ); 

if ( q_partialMatch ) 

f- > featureDiff = f- > feature2PDiff; 
if (f-> featureDiff ) 

res->featureDiffs[l] = sqrt( f-> featureDiff [ ids[l] ] ); 

else 

res->featureDiffs[l] = 0.0; 

} 

if (sf->ct&&keepCts) 

res->strFrags[l]= makeFragCopy(sf->ct, ids[l], -1 ); 
res->qids[2] = -1; 
res->strids[0] = ids[0]; 
res->strids[l] = ids[l]; 
res->strids[2] = -1; 

} 

return 0; 


#if0 

static int debugHits( FILE *fp, Split *query, Split *str, int bestq, int bestStr, int bestldx, int threeMatched 
{ 

split2 *qs2, *s2; 
split3 *qs3, *s3; 
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int ids[3]; 
Frag *f; 
Frag *sf; 


10 


if ( threeMatched ) 

{ 

qs3 = query- >s3 + bestq; 
s3 = str->s3 + bestStr; 

switch(bestldx) 

{ 

case 0: 
case 2: 

15 ids[0] = s3->fragl; 

ids[l] = s3->frag2; 
ids[2] = s3->frag3; 


20^ case 1: 

% case 3: 


22|= break; 

case 2: 


break; 


ids[0] = s3->frag4; 
ids[l] = s3->frag3; 
ids[2] = s3->frag2; 


#if 0 


ids[0] = s3->frag2; 
n ids[l] = s3-> tragi; 

3|; ids[2] = s3->frag3; 

O break; 
rU case 3: 

Q ids[0] = s3->frag2; 

C ids[l] = s3->frag3; 

35 ids[2] = s3->fragl; 

break; 

case 4: 

ids[0] = s3->frag3; 
ids[l] = s3->frag2; 
40 ids[2] = s3->fragl; 

break; 

case 5: 

ids[0] = s3->frag3; 
ids[l] = s3->fragl; 
45 ids[2] = s3->frag2; 

break; 

#endif 

default: 
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"TS_QID" ); 
"TS_QID" ); 


); 


} 

else 
{ 


return -1; 

) i 

f = query->frags + qs3->fragl; 

if(f->ct) 
{ 

fprintf(fp,"# diff %8.41f \n", sqrt( f->hexDiff [ ids[0] ] ) ); 
if ( bestldx < = 1 ) 

writeCopy(fp, f->ct, qs3-> tragi, (int) sqrt( f->hexDiff[ ids[0] ]), 

else 

writeCopy(fp, f->ct, qs3->frag4, (int) sqrt( f->hexDiff[ ids[0] ]), 

f = str-> frags + ids[0]; 
if (f->ct) 

writeCopy(fp,f->ct, ids[0], -1, "TS_SID"); 

} 

f = query- > frags + qs3->frag2; 
if (f->ct) 

{ ; 

fprintf(fp, ,, # diff %8.41f \n", sqrt( f->hexDiff [ ids[l] ] ) ); 

writeCopy(fp, f->ct, qs3->frag2, (int) sqrt( f->hexDiff[ids[l]] ), "TS_QID" 

f = str-> frags + ids[l]; 
if (f->ct) 

writeCopy(fp,f->ct, ids[l], -1, "TS_SID"); 

} 

f = query- > frags + qs3->frag3; 

if (f->ct) 

{ 

fpr^rp,'^ diff %8.41f \n\ sqrt( f->hexDiff [ ids[2] ] ) ); 

writeCopy(fp, f->ct, qs3->frag3, (int) sqrt(f->hexDiff[ ids[2] ] ), "TS_QID"); 

f = str->«frags + ids[2]; 

if (f->ct*) 


} 


writeCopy(fp,f->ct, ids[2], -1, "TS_SID"); 

i 


qs2 = query- >s2 + bestq; 
s2 = str->s2 + bestStr; 

if ( bestldx = = 0 ) 
{ 

ids[0] = s2-> tragi; 
ids[l] = s2->frag2; 

} 

else 
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i 

X 


{ 

ids[0] = s2->frag2; 
ids[l] = s2->fragl; 

} 

f = query- > frags + qs2-> tragi; 
sf = str-> frags + ids[0]; 
if (f->ct&& sf->ct) 
{ 

fprintf(fp,"# diff %8.41f \n", sqrt( f->hexDiff [ ids[0] ] ) ); 

writeCopy(fp, f->ct, qs2->fragl, (int) sqrt( f- > hexDiff[ ids[0] ] ), "TS_QID 

); I, 

writeCopy(fp, sf->ct, ids[0], -1, "TS_SID" ); 

} 

f = query- > frags + qs2->frag2; 
sf = str-> frags + ids[l]; 
if (f->ct&&sf->ct) 
{ 

fprintf(fp,"# diff %8.41f \n", sqrt( f->hexDiff [ ids[l] ] ) ); 

writeCopy(fp, f->ct, qs2->frag2, (int) sqrt( f->hexDiff[ ids[l] ] ), "TS QID 

); 

writeCopy(fp, sf->ct, ids[l], -1,"TS_SID" ); 

} 

} 

return 0; 

} 

#endif 

static struct CtConnectionTable *makeFragCopy(struct CtConnectionTable *ct, int id, int hexdiff ) 
{ 

char regName[80]; 
char *regid; 

struct CtConnectionTable *copyct; 

copyct = DB_CT_UTL_DUP_CT(ct, CtCopyKeepAllAttrs ); 
if ( Icopyct ) 

return copyct; 
regid = (char *) 0; 

DB_CT_GET_CT_ATTR(ct, CtCtRegld, &regid ); 
if (hexdiff != -1 ) 

sprintf(regName,"|%s_%d_%d M , (regid) ? regid : "str", id + 1, hexdiff); 

else 

sprintf(regName,"%s_%d", (regid) ? regid : "str", id+1 ); 
DB_CT_SET_CT_NAME_OR_REGID(copyct, CtCtRegld, regName ); 

return copyct; 


static void setAttr(struct CtConnectionTable *ct, char *name, char *value ) 
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{ 

char *tval; 

tval = (char *) 0; 

DB_CT_GET_CT_ATTR(ct, CtCtUserValue, &tval, name ); 
if (tval) 

DB_CT_UTL_MOD_SIMPLE_CT_ATTR(ct, CtCtUserValue, value, name ); 

else 

DB_CT_SET_CT_ATTR(ct, CtCtUserValue, value, name ); 
UTL_ERROR_CLEAR0; 

} 

static void writeCopy(FILE *fp, struct CtConnectionTable *ct, int id, int hexdiff, char *fieldname ) 
{ 

struct CtConnectionTable *copyct; 
char value[80]; 

copyct = makeFragCopy(ct, id, hexdiff ); 
if ( Icopyct ) 

return; 
if ( fieldname ) 
{ 

sprintf(value,*'%d", id+1 ); 
setAttr(copyct, fieldname, value ); 

} 

DB_CT_WRITE(fp, copyct ); 
DB_CT_DELETE_CT(copyct) ; 

} 

static int getAtomIds(CtConnectionTable *ct, int al, int *r a2, int *r a3 ) 
{ 

CtAtom *A; 
CtAtom *a3; 
int i; 

CtAtomBondData *b; 


A = ct-> atoms + al; 
*r_a2 = A->bond->toAtom; 
*r_a3 = -1; 

A = ct-> atoms + *r_a2; 

for ( i = 0, b = A->bond; i < A->bondCount; i+ + , b++ ) 
{ 

if (b->toAtom != al ) 
{ 

a3 = ct-> atoms + b->toAtom; 
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if ( *r_a3 == -1 j | a3->id.atomicNumber != HYDROGEN ) 

*r_a3 — b->toAtom; 
if ( a3->id.atomicNumber != HYDROGEN ) 

return 0; 

5 } 

} 

return -1; 

} 

/***************************************************#^^ 
10 modified from: 

* int SYB_MGEN_CONN_CFA_DIFF( identifier, nargs, args, writer ) * 

* Dick Cramer, Nov. 20, 1996 

* Computes difference between two CoMFA fields, represented as text 
15 * strings encoded by the expression generator %cfa_hex() 

C function CTFIELD2HEX0 

* * 

********************************************** 

2CL static double fieldHexDiff( char *cptr, char *cqtr, int nosq ) 

| { ; 

fy #define pow2(a) ( (a) * (a) ) 

S static double boundary [ 16] ;; 

2|^ static double Dist[16][16]; 

2 static double DnSq[16][16]; 

m static int InitDist; 

double xount; 

<5 int i, j, nch, ptr, qtr; , 

3fc char tempString[25]; ; 

ry if ( !cptr j | !cqtr ) 

O return 999999.0; 

35 if ( (nch = strlen(cptr)) ! = strlen(cqtr) ) 

return 999999.0; 

/* initialization on 1st call */ 
if (IlnitDist) 
40 { 

boundary [0] = 9999.; 
boundary [1] = -0.1 ; 
for (i=2;i< 15;i++) 

boundary [i] = 2*i-3; 
45 boundary[15] = 30.0; 

for (i=0;i<16;i++) 
{ 
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for(j=0;j<16;j + +) 
{ 

DnSq[i][j] = (double) fabs( boundary[i] - boundary!]] ); 
Dist[i][j] = pow2( boundary[i] - boundary!]]); 

5 } 
} 

InitDist = 1; 

} 

for (xount=0.0, i=0; i<nch; i + = 2, cptr + = 2, cqtr +=2) 
10 { 

sscanf( cptr, *'%2x", &ptr ); 
sscanf( cqtr, "%2x", &qtr ); 
xount + = nosq ? 
DnSq[ ptr&OxOF ][ qtr&OxOF ] 
15 + DnSq[ (ptr & OxFO) > > 4][ (qtr & OxFO) > > 4] 

Dist[ ptr&OxOF ][ qtr & OxOF ] 

+ Dist[ (ptr & OxFO) > > 4][ (qtr & OxFO) > > 4] ; 

} 

2CL return (nosq && xount > 0.0 ) ? xount : sqrt( xount ); 

S > i 

S static char *hexStringToInts(char *cptr, int *r_size) 

m I 
2*p int len, i; 

% char *arr; ; 

l=i *r_size = 0; 

30| if ( !cptr ) 
S return (char *) 0; 

rjl len = strlen(cptr); 

H arr = malloc(len); 

35 for ( i = idx = 0; i < len; i+ + , cptr+ + , idx+ + ) 

{ 

if ( *cptr < = '9' ) 

arr[idx] = *cptr - '0'; 
else f 
40 arr[idx] = *cptr - 'a' + 10; 

} 

*r_size = len; 
return arr; 


45 


static double *compressField(double *topfield, int npoints ) 
{ 
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static double minv = -0.40; 
static double maxv = 0.40; 
static int nreported; 
static int maxalloc; 
static double *tbuff; 
static int ncomp; 
static int tpoints; 
static int newPoints; 
int cnt; 
double *tptr; 
double *cfield; 
int dsize; 
int i; 

double *lptr; 
int needpoints; 
#ifdef NUMBER_OF_COMPRESSION_FIELDS 

double totals[NUMBER_OF_COMPRESSION_FIELDS]; 
int cnts[NUMBER_OF_COMPRESSION_FIELDS]; 
int gridsize; 
int grid; 

gridsize = npoints / NUMBEROFCOMPRESSIONFIELDS ; 
for ( i = 0; i < NUMBER_OF_COMPRESSION_FIELDS; i++ ) 
{ 

totalsfi] = 0.0; 
cnts[i] = 0; 

} 

#endif 


needpoints = npoints + COMPRESSIONPOINTS; 

if ( needpoints > max alloc ) 

{ 

if (tbuff) 

free( (char *) tbuff); 
if ( max_alloc = = 0 ) 

max_alloc = 2000; 
while ( max_alloc < needpoints ) 

max_alloc *= 2; 
tbuff = (double *) malloc(sizeof(double) * max alloc ); 

} 


for ( i = cnt = dsize = COMPRESSIONPOINTS, tptr = tbuff + COMPRESSIONPOINTS, 
fptr = topfield; i < npoints; i+ + , lptr++ ) 

{ 

if ( ( *fptr < maxv && *fptr > minv ) && 

(cnt > 0 j | ((i+1) < npoints && *(fptr+ 1) < maxv && *(fptr+l) 

> minv ) ) ) 

cnt+ + ; 
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else 

{ 

if ( cnt ) 
{ 


} 

10 else 

{ 


*tptr+ + = (double) (cnt + 100); 
*tptr+ + = *fptr; 
dsize +=2; 
cnt = 0; 


*tptr++ = *fptr; 
dsize+ + ; 

} 

15 #ifdef NUMBER_OF_COMPRESSION_FIELDS 

if (*fptr > 1.0) 
{ 

grid = i / gridsize; 

if ( grid > = NUMBER_OF_COMPRESSION FIELDS ) 
2p, grid = NUMBEROFCOMPRESSIONFIELDS - 1; 

"*z cnts[grid] += 1; 

■2 totals[grid] + = *fptr * *fptr; 

s } : 

;;S #endif 1 

iMl } ! ' 

t } 1 

12 if ( cnt ) 

r { 

*tptr+ + = (double) (cnt + 100); 
301; dsize+ + ; ! 

n } 

m #ifdef NUMBER_OF_COMPRESSION_FIELDS 

R for ( i = 0; i < NUMBER OF COMPRESSION FIELDS ; i+ + ) 

C { 
35 tbuff[i] = 0.0; 

if ( cntsfi] > 0 ) 

tbuff[i] = totals[i] / (double) cnts[i]; 

tbuff[ i + NUMBER_OF_COMPRESSION FIELDS] = cntsfil; 

} ; 

40 #endif 

cfield = (double *) malloc(sizeof(double) * dsize ); 
memcpy((char *) cfield, tbuff, sizeof(double) * dsize ); 

45 #ifO 

if ( nreported < 3 ) 
{ 

ncomp++; 
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tpoints += npoints; 
newPoints += dsize; 

if ( ncomp = = 1000 ) 
{ 

fyrintf(stderr,"compression average for last %d frags: %6.21f %d / %d\n", 
ncomp, 

(double) (newPoints * 100) / (double) tpoints, 

newPoints, tpoints ); 
tpoints = newPoints = ncomp = 0; 
nreported-f 4-; 

} 

} 

#endif 
#if 0 

fprintf(stderr, "compressed perc: %5.11f new size: %d old size: %d\n", 
(double) (dsize* 100)/(npoints), dsize, npoints ); 
#endif \ 
#if0 

fprintf(stderr, tt un-compressed\n"); 

for ( i = 0, fptr = topfield; i < npoints; i+ + , fptr+ + ) 

fprintf(stderr,"%6.21f%s", *fptr, (0+1) % 20) ? M " : "\n" ); 
fprintf(stderr, "\ncompressed:\n"); 
for ( i = 0, fptr = cfield; i < dsize; i+ + , fptr+ + ) 

fprintf(stderr,"%6.21f%s'\ *fptr, ((i+1) % 20) ? " " : "\n" ); 
Q)rintf(stderr,"\n"); 

#endif 

return cfield; 

} 

static doubletopFieldCompressedDiff(double *start_qry, double *start_str, int npoints, double startPenalty 
) 

int i, j, k, minval; \ 
double dval, qval, sval, fiitval; 
int qrySkip, strSkip; ! 
int qpoints, spoints; 
double *qry, *str; 
#ifdef NUMBER_0F_C0MPRES|SI0N_FIELDS 

int distCnt 1 [NUMBEROFCOMPRESSIONFIELDS] ; 
int distCnt2[NUMBER_OF_COMPRESSION_FIELDS] ; 
int dist; 
double avgval; 

double avgl[NUMBER_OF_COMPRESSION_FIELDS]; 
double avg2[NUMBER_OF_COMPRESSION_FIELDS] ; 

#endif 
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if ( !start_qry j | Istart str | j Inpoints ) 

return 9999.0*9999.0; 
t_fcompare+ + ; 

#ifdef NUMBER OF COMPRESSION FIELDS 
filtval = startPenalty; 

for ( i = 0; i < NUMBER_OF_COMPRESSION_FIELDS; i++ ) 
{ 

avgl[i] = start_qry[i]; 

distCntl[i] = start_qry [i + NUMBER_OF_COMPRESSION_FIELDS] ; 
avg2[i] = start_str[i]; 

distCnt2[i] = startstr [i + NUMBER_OF_COMPRESSION_FIELDS] ; 

#if0 

fprintf(stderr,"%d: cnts: %d vs %d avg:%9.31f %9.31f\n", 
i, ,distCntl[i], distCnt2[i], avgl[i], avg2[i] ); 
#endif ■ 

} 

for ( i = 0; i < NUMBER_OF_COMPRESSION_FIELDS && filtval < qbailout; i ++ ) 

{ ' 

dist = abs(distCntl[i] - distCnt2[i] ); 
if (distCntl[i] > distCnt2[i] ) 
{ 

dist = distCntl[i] - distCnt2[i]; 
avgval = avgl[i]; 

} 1 
else 

{ 

dist = distCnt2[i] - distCntl[i]; 
avgval = avg2[i]; 

} 

filtval + = avgval * (double) dist; 


if ( filtval > q_b ailout ) 
return filtval; 
#endif * f 

i = 0; 
sval = 0.0; 

strSkip = qrySkip - 0; 
qpoints = spoints = 0; 

qry = start_qry + COMPRESSION_POINTS; 
str = start_str + COMPRESSION_POINTS; 
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while ( qpoints < npoints && spoints < npoints && sval < q_bailout ) 

{ 

if ( qrySkip < 0 ) 

qrySkip = 0; 
if ( strSkip < 0 ) 

strSkip = 0; 

if ( qrySkip = = 0 && *qry > 100.0 ) 
{ 

qrySkip = (int) (*qry - 100.0); 

} 

if ( strSkip == 0 && *str > 100.0 ) 
{ 

strSkip = (int) (*str - 100.0); 

} 

/* Example: 
compressed: Query 

117.00 3.18 3.21 104.00 30.00 30.00 30.00 103.00 30.00 30.00 30.00 1.17 103.00 26.87 
30.00 30.00 5.30 117.00 29.64 4.78 

30.00 30.00 0.20 101.00 5.30 30.00 30.00 30.00 30.00 13.90 101.00 30.00 30.00 30.00 
30.00 4.77 102.00 3.72 30.00 30.00 

30.00 1.05 117.00 29.64 5.86 30.00 30.00 0.19 101.00 5.33 30.00 30.00 30.00 30.00 
13.89 101.00 30.00 30.00 30.00 30.00 

4.54 102.00 3.61 30.00 27.54 120.00 0.19 3.70 3.84 104.00 30.00 30.00 30.00 103.00 
1.13 15.09 3.12 0.25 
compressed: Str 

122.00 1.76 0.67 105.00 30.00 30.00 1.47 104.00 1.75 0.68 125.00 3.64 21.47 30.00 
9.03 103.00 30.00 30.00 30.00 26.83 

103.00 3.65 21.46 30.00 9.12 119.00 0.31 8.11 103.00 3.64 19.21 30.00 30.00 103.00 
30.00 30.00 30.00 30.00 0.28 102.00 

3.65 19.31 30.00 30.00 119.00 1.44 24.84 2.35 104.00 30.00 30.00 30.00 103.00 15.28 
30.00 30.00 30.00 1.40 103.00 7.38 

30.00 0.21 119.00 1.64 3.18 105.00 30.00 30.00 105.00 30.00 30.00 
*/ 


if ( strSkip = = X) && qrySkip = = 0 ) 

< ; 

while (spoints < npoints && qpoints < npoints && *str < 100.0 && *qry < 

100.0 ) 

{ 

dval = (*str - *qry) * autoScaleFactor; 
dval *= dval; 
syal += dval; 

str++; 
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qry++; 
qpoints* +; 
spoints + + ; 

} 

5 } 

else 

{ 

#if 0 

fprintf(stderr, "start: %d %d %d %d %d %8.21f strldx:%d qryldx:%d\n", 
10 strSkip, qrySkip, spoints, qpoints, npoints, sval, 

(int) (str - start_str), (int) (qry - start_qry) ); 

#endif 

if ( strSkip > qrySkip ) 
{ 

15 if ( qrySkip > 0 ) 

{, 

qpoints += qrySkip; 
spoints += qrySkip; 
. strSkip -= qrySkip; 
2( L i qrySkip = 0; 

S! while (strSkip && qpoints < npoints && *qry < 100.0 ) 

dval = *qry * autoScaleFactor; 
1= dval *= dval; 

21 5 sval + - dval; 


30:? 


strSkip-; 
. qpoints+ + ; 
> spoints + + ; 
qry++; 

} 

if (strSkip == 0) 
str+ + ; 
} 

else if ( qfySkip > strSkip ) 
{ 

if ( strSkip > 0 ) 
40 { ( 
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qpoints + = strSkip; 
spoints + = strSkip; 
qrySkip -= strSkip; 
strSkip = 0; 
str++; 

} 

while ( qrySkip && spoints < npoints && *str < 100.0 ) 
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dval = *str * autoScaleFactor; 
dval *= dval; 
sval += dval; 


} 

else 

{ 


qrySkip-; 
qpoints+ + ; 
spoints+ + ; 
str+ + ; 

} 

if ( qrySkip = = 0 ) 
qry+ + ; 


/* they are the same, what luck */ 
qpoints += qrySkip; 
spoints += strSkip; 
qrySkip = 0; 
strSkip = 0; 
str+ + ; 
qry++; 


} 

} 

} 

Only one of the while loops can process */ 
while ( qpoints < npoints ) 

{ 

if ( *qry < 100.0 ) 
{ 

dval = *qry * autoScaleFactor; 
dval *= dval; 
sval += dval; 
qpoints + + ; 

} 

else 

{ 

qrySkip =? (int) (*qry - 100.0); 
qpoints + = qrySkip; 

} 

qry++; 

} 

while ( spoints < npoints, ) 
{ 

if ( *str < 100.0 ) 
{ 

dval = *str * autoScaleFactor; 
dval *= dval; 
sval + = clval; 
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spoints-f-J-; 

} 

else 

{ 

5 strSkip = (int) (*str - 100.0 ); 

spoints +'•= strSkip; 

} ' 
str+ + ; 

} 

10 #if0 

if ( filtval > sval ) 
{ 

tprintf(stderr," fill higher than actual: %8.41f actual: %8.41f\n", filtval, sval ); 

} 

15 if ( sval > q_bailout ) 

{ 

tprintf(stderr, "ACTUAL more than bailout: % 8. 31f filtval: %8.31f bail:%8.31f \n", sval, 
filtval, q_bailout ); • 

20, if ( filtval > q_bailout ) 
J fprintf(stderr, "compressed field bailout %8.41f actual: % 8. 41f bailout: %8.41f %s\n", 

?S filtval, sval, q_bailout, 

m (sval > q_bailout ) ? "WORKED" : "FAILED" ); 

ij #endif ■ 
25= return sval; 

I } 

7 static double topFieldDiff(double *qry, double *str, int npoints ) 

3Qr double dval; 

□ double sval; ' * 

ry int i; 

M: if ( !qry j j !str | j Inpoints ) 
35 return 9999.0*9999.0; 

for ( i = 0, sval = 0.0; i < npoints; i-f + ) 

dval = *qry++ - *str++; 
40 dval *= dval; \ 

sval + = dval; 

} 

t_fcompare+ + ; 
45 return sval; 
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static double fieldIntDiff( char *cptr, char *cqtr, int si, int s2) 

{ 

static double Dist[16][16]; 
static int InitDist; 
5 double xount; 

int i; 


if (si != s2 || !cptr || !cqtr) 
10 return 999999.0; ; 

/* initialization on 1st call */ 
if (IlnitDist) 
{ 

15 intj; 

double dval; 
double boundary! 16]; 

boundaryfO] = 99*99.; 
2Qk boundary[l] = -0.1 ; 

^ for (i=2;i< 15;i++) 

fS boundary [i] = 2*i-3; 

m boundary [15] = 30.0; 

25= for (i=0;i<16;i++) 

i { 

do for(j=0;j<16;j++) 

r { 

0 dval = boundary[i] - boundarylj]; 
30p Dist[i][j] = dval * dval; 

1 } 1 

£3 InitDist =1; 

U } 

35 for (xount=0.0, i=0; i < si ; i+ + , cptr+ + , cqtr++ ) 

{ i 

xount + = Dist[*cptr][*cqtr]; 

} 

t_fcompare+ + ; 
40 return xount; 

} 

#if0 

static double 2nd_fieldIntDiff( unsigned short *cptr, unsigned short *cqtr, int si, int s2) 
45 { 

#define pow2(a) ( (a) * (a) ) 

static double boundary[16]; 
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static double Dist[16][16]; 

static double DnSq[16][16]; 

static int InitDist; 

double xount; 

double dval; 
int i, j, nch, ptr, qtr; 
char tempString[25]; 

if (si != s2 1 1 !cptr 1 1 Icqtr) 
return 999999.0; 


/* initialization on 1st call */ 
if (UnitDist) 
{ 

boundary[0] = 9999.; 
boundary[l] = -0.1 ; 
for(i=2;i< 15;i++) 

boundary [i] = 2*i-3; 
boundary[15] = 30.0; 

for (i=0;i<16;i++) 
{ 

for(j=0;j<16;j + + ) 
{ 

dval = boundary[i] - boundary |j]; 
DnSq[i][j] = (double) fabs( dval ); 
Dist[i][j] = dval * dval; 

} 

} 

InitDist = 1; 

} 

for (xount=0.0, i=0; i <; si ; i+ + , cptr+ + , cqtr++ ) 
{ 

ptr = (int) *cptr; 
qtr = (int) *cqtr;5 

xount + = Dist[ ptr & OxOF ][ qtr & OxOF ] 
+ Dist[ (ptr &t)xFO) > > 4][ (qtr & OxFO) > > 4]; 

} 

t_fcompare+ + ; 
return xount; 


static double fieldIntDiffSq( unsigned short *cptr ? unsigned short *cqtr, int si, int s2) 
double rval; 

if (si != s2 || !cptr j| Icqtr) 
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t 


10 


return 999999.0; 
rval = fieldIntDiff( cptr, cqtr, si, s2 ); 
if ( rval < = 0.0 ) ; 

return 0.0; 
return sqrt( rval ); 

} 

#endif 


int TOP_GET_STATS(int dumpRegions, int *r_tfrags, int *r_2compare ? int *r_3compare, int 
*r Jcompare, int *r_filtered, int *r_feat, double *r_outsidePerc ) 


double perc; 4 

double tregions; { 
15 int i; 

*r_tfrags = tot_uniq_frags; 

*r_2compare = t_2compare; 

*r_3compare = t_3compare; 

*r_fcompare = t_fcompare; 
20 = *rfiltered = ^filtered; 

y *r_feat = tJfeatFiltered; ' 

5 if ( t_fields ) 

m i 

perc = ( (double) t_outside * 100.0 ) / (double) tjields; 
£ *r_outsidePerc = perc; 

m > 

" else 

r*. *r_outsidePerc = 0.0; 

30g 

5 if ( dumpRegions ) 

I < 

n for ( i = tregions = 0; i < maxregions; i+ + ) 

Z { 
35 tregions += regionUseCnts[i]; 

} 

if ( tregions ) 

{ 

fprintf(stderr, "Region stats:\n"); 
40 for ( i = 0; i < max regions; i + + ) 

fprintf(stderr,"%5.21f ( (double) regionUseCnts[i] * 100.0 ) / (double) 


tregions ); 

} 


fprintfCstderr/Mn"); 


45 } 
} 


static double computeAttachmentPenalty( Frag *qry, Frag *str, Frag *other_qry, Frag *other_str ) 
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{ 

double *qry_cords; 
double *str_cords; 
double dx, dy, dz; 
5 double pen; 

if ( !qry-> cords j j !str-> cords ) 

return 0.0; 
pen = 0.0; 

10 

#if 0 

/* 

The query cords and structure cords copyBaseAtom point to the origin, so 
we don't need to compare them, we need to compare the other base atom, where 
15 it is now. 

Don't need to do this set, it's always zero, the is the atom which is at the origin. 

*/ 

qry_cords = qry- > cords + (qry->copyBaseAtom*3); 
str_cords = str-> cords + (str- > copyBaseAtom*3); 

J dx = *qry_cords - *str_cords; 

m dy = *(qry_cords+l) - *(str_cords+l); 

Wi dz = *(qryjx)rds+2) - *(str_cords+2); 

b pen = (dx*dx + dy*dy + dz*dz) * q_attachPenFactor; 

m #ifdef DEBUG J)ETAIL 
if ( q_debugfp ) 

D fprintf(q_debugft, H # attach qry: %d str:%d %6.21f %6.21f %6.21f %8.31f (atoms: %d 

3% %d) (bases: %d %d %d %d)\n", 

□ qry- > id 4 1 , str- > id + 1 , dx, dy, dz, pen, 

[y qry- > ct- > atomCount, str- > ct- > atomCount, 

□ qry- > copyBaseAtom, str- > copyBaseAtom, other_qry- > copyBaseAtom, 
u other str- > copyBaseAtom ); 

35 #endif 
#endif 

qry_cords = qry- > cords + (other_qry->copyBaseAtom*3); 
str_cords = str- > cords + (other_str->copyBaseAtom*3); 

40 

dx = *qry_cords - *str_cbrds; 

dy = *(qryjx>rds+l) - *(str_cords + l); 

dz = *(qry_cords+2) - *(str_cords+2); 

45 pen + = (dx*dx + dy*dy + dz*dz) * (LattachPenFactor; 

return pen; 

} 
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static int double_compare(const void *vnrec, const void *vtrec ) 

{ 

double *n = (double *) vnrec; 
double *t = (double *) vtrec; 

5 

return (int) *n - *t; 

} 

static void PartialMatchFeatures(Split *qs, int mode, Frag *ql, Frag *q2, Frag *q3, Frag *q4, Split *str, 
10 Frag *fl, Frag *f2, Frag *f3, Frag *f4, int matchCnt ) 
{ 

double *aa, *da; 

double *either; 

int *both; 
15 double splitDiff; 

int i, cnt; 

int atomCount; 

int fcntl, fcnt2, fcnt3, fcnt4; 

int noFrags; 
2Q^ static Split *last_split; 

m ^ ( !qs 1 1 !qs->ct j j !ql 1 1 !q2 ] | !str | j !fl | | !f2 j j matchCnt = = 0 j | !qs->featureMask) 
IS return; 

25p if ( last_split != qs) 
T. qs->connectedHBCnt = (int *) 0; 

rfi lastjsplit = qs; ' 

r 3 atomCount = qs- > ct- > atomCount; 

3CP aa = (double *) calloc(sizeof(double), atomCount ); 

□ da = (double *) calloc(sizeof(double), atomCount ); 

ry either = (double *) calloc(sizeof(double),atomCount ); 

p both = (int *) calloc(sizeof(int),atomCount ); 

35 for ( i = 0; i < atomCount; i++ ) 

{ ! 

either[i] = da[i] = aa[i] = -1.0; 

} 

40 

if ( mode = = 2 ) 

{ 

ql->featureDiff = ql->feature2PDiff; 
q2->featureDiff ±= q2->feature2PDiff; 
45 if ( q3 ) 

q3->featureDiff = q3- > feature2PDiff; 

if(q4) 

q4->featureDiff = q3->feature2PDiff; 
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} 

else if ( mode = = 3 ) ( 

{ 

ql->featureDiff = ql->feature3PDiff; 
q2->featureDiff = q2->feature3PDiff; 
if(q3) 

q3- > featureDiff = q3- > feature3PDiff; 

if(q4) 

q4->featiireDiff = q4- > feature3PDiff; 

} 

else 
{ 

ql-> featureDiff = ql->featureSubsetDiff; 
q2-> featureDiff = q2->featureSubsetDiff; 
if(q3) 

q3-> featureDiff = q3- > featureSubsetDiff; 

if(q4) 

q4-> featureDiff = q4- > featureSubsetDiff; 

} 

fcntl = fcnt2 = fcnt3 = fcnt4 = 0; 

ql->featureDiff[fl->id]:= MeasureClosest(qs, ql, str, fl, da, aa, &fcntl ); 
q2->featureDiff[f2->id] = MeasureClosest(qs, q2, str, f2, da, aa, &fcnt2 ); 
if ( q3 && f3 ) i 

q3->featureDiff[f3->id] = MeasureClosest(qs, q3, str, f3, da, aa, &fcnt3 ); 
if ( q4 && f 4 ) 

q4->featureDiff[f4->id] = MeasureClosest(qs, q4, str, f4, da, aa, &fcnt4 ); 


noFrags = 0; 
if ( fcntl ) 

noFrags++; 
if(fcnt2) 

noFrags+ + ; 
if ( fcnt3 ) 

noFrags-h + ; 
if ( fcnt4 ) 

noFrags-h-f; 

for ( i = cnt = 0; i < atomCount; i+ + ) 

{ 

if ( da[i] != -1.0 && ( either[i] == -1.0 | j da[i] < either[i] ) ) 

either [i] = da[i], cnt++; 
if ( aa[i] != -1.0 && ( either[i] = = -1.0 1 1 aa[i] < either[i] ) ) 

either [i] = aa[i], cnt++; 
if ( da[i] != -1.0 && aa[i] != -1.0 ) 

both[i] = 1; 

} 
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#ifO ' 

fprintf(stderr,''%d %d %d %d frags:%d frag_cnt:%d\n", fcntl, fcnt2, fcnt3, fcnt4, noFrags, 
cnt ); < 
#endif 

5 CoverConnectedHB(qs, qs->ct, either ); 

for ( i = cnt = 0, splitDiff = 0.0; i < atomCount; i+ + ) 
{ 

if (eitherfi] != -1.0) 
10 { 

if ( both[i] ==0) 

aa[cnt] = either [i]; 

else 

aa[cnt] = ( aa[i] + da[i] ) / 2.0; 
15 splitDiff + = aa[cnt]; 

cnt+ + ; 

} i 
} ? 

20^ if ( cnt > matchCnt ) | 

^ qsort( (void *) aa,' (size_t) cnt , (size_t) sizeof(double), 

^ double_cqmpare ); 

for ( i = 0, splitDiff = 0.0; i < matchCnt && i < cnt; i + + ) 
25f: splitDiff +=aa[i]; 

t #if 0 

for (i = 0; i < cnt; i++ ) 
L fprintf(stderr,"%8.21f aa[i] ); 

30S if ( cnt ) 

5 rprintf(stderr,"\n"); 
m #endif 

iH splitDiff *= q_featureFactor; 

35 if (cnt == 1 ) 

splitDiff *= 2.0;j /* If there is only one donor or acceptor, increase the weighting 
automatically. Always a good thing. */ 
if ( noFrags > 1 ) 

{ 

40 splitDiff /= (double) noFrags; 

} 

ql->featureDiff[fl->id] = q2->featureDiff[f2->id] = 0.0; 
if (q3) 

45 q3- > featureDiff[f3- > id] = 0.0; 

if(q4) 

q4->featureDiff[f4->id] = 0.0; 


226 


J 


10 


if ( fcntl ) 

ql->featureDiff[fl->id] += splitDiff; 
if (fcnt2) 

q2-> featureDiff[f2- > id] + = splitDiff; 
if (fcnt3) 

q3- > featureDiff[D- > id] + = splitDiff; 
if (fcnt4) 

q4->featureDiff[f4->id] += splitDiff; 


free((char *) aa); 
free((char *) da); 
free((char *) either); 
free((char *) both ); 
15 return; 

} 

static void CoverConnectedHB(Split *qs ? struct CtConnectionTable *ct, double *HB ) 

{ 

20=, CtAtom *A; 

1= CtAtomBondData *bond; i 

m int queryMask; 

J int aHB; 

\fi int i, j, k, idx, cnt, coverCnt; 

2 ft int *Worse; 

j aHB = FeatureHBA j FeatureHBD; 

7 if ( !qs- > connectedHBCnt ) 

O { 

3Qp; qs-> connectedHBCnt = (int *) calloc(sizeof(int), ct- > atomCount ); 

p qs->connectedHB Atoms = (int *) calloc(sizeof(int), ct-> atomCount * 5 ); 

fy qs->connectedHBTotalCnt = 0; 

y= for ( i = 0, A = ct-> atoms; i < ct-> atomCount; A++ ) 

35 { 

queryMask = qs->featureMask [ i ]; 
if ( queryMask & aHB ) 

{ 

for ( cnt = j - 0, bond = A- > bond; j < A- > bondCount && j < 5; 

40 j+ + ,bond++ ) 

{ 

queryMask - qs->featureMask [ bond- > to Atom ]; 
if ( queryMask & aHB ) 

, { 

45 idx = i*5 + cnt; 

qs- > connectedHBCnt[i] += 1; 

qs- > connectedHBAtoms[idx] = bond- > to Atom; 

qs- > connectedHBTotalCnt + + ; 
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cnt+ + ; 

} 

} 

} 

} 

} 

if ( qs-> connectedHBTotalCnt = = 0 ) 
return; 

Worse = (int *) calloc(sizeof(int), ct->atomCount ); 


for(j = l;j < 5;j + + ) 
{ 

for ( i = 0; i < ct->atomCount; i+ + ) 

{ 

15 if ( qs- > connectedHBCnt[i] ! = j ) 

continue; 

for ( k = 0; k < qs->connectedHBCnt[i]; k++ ) 
{ 

idx = i*5 + k; 

20- if ( HB[i] > HB[ qs->connectedHBAtoms[idx] ] ) 

3 ( 

Worse[i] = 1; 

i } 

m } 

25c } 

J= i 

m for ( i = 0; i < ct->atomCount; i++ ) 

r { 

Q if ( Worse[i] ) 

30*: HB[i] = -1.0; 

6 } 

fy free((char *) Worse); 

D return; 


static double MeasureClosest(Split *qs, Frag *ql, Split *str, Frag *fl, double *da ? double *aa ? int 
*r_fcnt ) 


int *qmask; 
40 int *smask; 

int i j,k; 

double best = 99999.0; 
int found = -1; 
double worst; 
45 int qid, sid; 

int *qMap, *strMap; 
FeatureType qfeature, strFeature; 
double x,y,z; 
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double distsq; 

double otherDiff = 0.0; 

double *qryCords, *strCords; 

double attFact; 

double fieldDiff = 0.0; 

double extraDiff = 0,0; 

double featDiff; 

int centAtoms[6]; 

int cidx; 

AromSet *qset, *strSet; 
int *covered; 

static int featureCnt[4]; ■. 
static int *extraFeatureCnt[4]; 
int queryHB; 
int strHB; 
int origldx; 

*r_fcnt = 0; 


featureCntfO] = featureCnt[l] = featureCnt[2] = featureCnt[3] = 0; 

extraFeatureCntfO] = extraFeatureCnt[l] = extraFeatureCnt[2] = extraFeatureCnt[3] = 

qmask = qs- > featureMask; 

smask = str-> featureMask; 

qMap = ql->origMapping; 

strMap = fl->origMapping; 

if ( !ql-> cords || !fl-> cords) 

{ 

return otherDiff; 

} 

covered = (int *) calloc(fl->atomCnt,sizeof(int) ); 

#ifdef DEBUGDETAIL 
if ( q_debugfp ) 
{ 

rprintf(q_debugfp, "\n# Feature comparison Query Id: %d Structure Id: %d\n", 
ql->id + 1, fl->id + 1 ); 

} 

#endif 

/* do the single atom features first */ 
for ( i = 0; i < ql->atomCnt ; i+ + ) 
{ 

if ( qmask[ qMapfi] ] = = FeatureNone ) 

continue; /* no single atom feature at this atom */ 

origldx = qMap[i]; 

qfeature = qmask[qMap[i]]; 
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for (k = 0; k < A\ k+ + ) 
{ 

if ( !( qfeature & fMasks[k] ) ) 

continue; 
best - 999999.0; 
found = -1; 

worst = (double) featureWeights[k+ 1] * featureWeights[k+l]; 

for(j = 0;j < fl->atomCnt; j + + ) 

{ 

if ( !( smask[ strMap[j] ] & fMasks[k] ) ) 

continue; 
strFeature = smask[ strMaplj] ]; 

#ifO 

/* don't 

count attachment features in core mode */ 

if ( q_coremode && ( strMaplj] == fl->copyBaseAtom 1 1 strMaplj] 

= = str2ndAttach ) ) 
#endif 


#ifdef DEBUG DETAIL 


continue; 

qryCords = ql-> cords + (i*3); 

strCords = fl-> cords + 0*3); 

x }= *qryCords - *strCords; 

y - *(qryCords+l) - *(strCords+l); 

z = *(qryCords+2) - *(strCords+2); 

distsq = x*x + y*y + z*z; 

if { ( distsq < best ) 

{ 

best = distsq; 
j found = j; 

} 


if ( q_debugfp ) 

fprintf(q_debugfp, "# feature compare: %d %d type:%d 
distance: %7.41f best: %7.41f from: %d. %d\n M , 

i + 1 , j + 1, k+ 1 , sqrt(distsq), best, ql- > id+ 1 , f 1- > id + 1 

); 

#endif 

} 

if (found*! = -1 ) 

covered[found] | = fMasks[k]; 


a squared */ 


attFact = 1.0; 
if ( best > 0.25 ) 


/* More than 0.5, this causes a penalty, best is 


{ 


if (ql->AtWts ) 
{ 

if ( fl-> AtWts && found ! = -1 ) 
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attFact = ( ql- > AtWtsfi] + f 1- > AtWts[found] ) / 2.0; 

else 

attFact = ql->AtWts[i]; 

} 

else if ( fl-> AtWts && found ! = -1 ) 
attFact = fl->AtWts[found]; 

if ( best > 3.0625 ) /* worst case distance is greater than 1.75 perfect 
mismatch (see GOLD/GASP papers) */ 

{ 

featDiff = worst * attFact; 
fieldDiff + = featDiff; 

} 

else 

{ ; 

featDiff = worst * attFact * (( best - 0.25 ) / 2.8125 ); 
fieldDiff + = featDiff; 

} 

} 


#if 0 


#endif 


#if 0 


#endif 


featDiff = 0.0; 


else 
{ 

} 

if ( qfeature & FeatureHBA ) 
{ 

fprintf(stderr,"HBA %d %d, origldx: %8.21f featDiff: %8.21f\n", 
i, origldx, aa[origIdx], featDiff ); 

if ( aaforigldx] == -1.0 | j aa[origIdx] > featDiff) 

: aa[origIdx] = featDiff; 
*r_fcnt + = 1; 

} 

} 

if ( qfeature & FeatureHBD ) 
{ 

fptintf(stderr,"HBD %d %d, origldx: %8.21f featDiff: %8.21f\n", 
i, origldx, da[origIdx], featDiff ); 

if ( da[origIdx] == -1.0 | j da[origIdx] > featDiff) 
{ 

da[origIdx] = featDiff; 
*r_fcnt +=1; 

} 
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if ( qfeature & FeaturePos 1 1 qfeature & FeatureNeg ) 
otherDiff + = featDiff; 

#ifdef DEBUG_DETAIL 

if ( q_debugfp ) 
{ 

fprintf(q_debugfp, 

"# feature q:%d s:%d ftype:%d best: %7.41f a:%5.31f 

worst: % 1 1.21f FieldDiff: %9.31f\n", 

i+1, found, qmask[ qMap[i] ], best, attFact, sqrt(worst), 

fieldDiff); 

} 

#endif 

} 

} 

/* Now for the extra feature penalty, count all non-covered features */ 
for ( j = 0; j < fl->atomCnt; j + + ) 

{ ; 

if ( smaskf strMaplj] ] != FeatureNone ) 

{ 


#ifO 

str2ndAttach ) ) 
#endif 


: 1.0); 


if ( q_coremode && ( strMap[j] == str->copyBaseAtom 1 1 strMap[j] = = 

< continue; 

strFeature* = smask[ strMaplj] ]; 
for ( k = 0; k < 4; k+ + ) 

{ 

if ( !( strFeature & fMasks[k] ) ) 

continue; 
if ( !( covered[j] & fMasks[k] ) ) 
{ 

worst = featureWeights[k+ 1] * ( (fl- > AtWts) ? f 1- > AtWts[j] 

featDiff = (worst * worst * q_extraFeatureFactor ); 
otherDiff + = featDiff; 
extraFeatureCntfk] +=1; 


#ifdef DEBUG DETAIL 


worst: %11.21f FieldDiff: %9.31f\n" 


#endif 


if ( q_debugfp ) 

fprintf(q_debugfp, "# missing feature %d,%d %d 

fl->id+l, j+1, 

smask[ strMaplj] ], worst, fieldDiff); 
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} 

free((char *) covered ); 


/* end of single atom, now do the aromatic rings */ 

/* Find the 5 and 6 membered aromatic rings in the fragments, setup centroids for quick 
comparisons */ 

if ( ql-> aromCnt = = -1 ) 
{ 

attFact = LO; 
ql->aromCnt - 0; 

for ( i = 0, qset = qs->aromSets; i < qs->numArom; i+-h, qset+ + ) 

{ 

for ( k = cidx = 0; cidx < 6 && k < ql->atomCnt; k++ ) 
{ 

if *( qset-> atoms [ qMapfk] ] ) 
{ 

if (ql->AtWts) 

attFact = ql-> AtWts[k]; 

else 

attFact = 1.0; 
centAtoms[cidx] = k; 
cidx+ +; 

} 

if ( qset->numAtoms && qset->numAtoms = — cidx ) 

{ 

if ( !computeCentroid(ql-> cords, centAtoms, cidx, &x, &y, &z ) ) 
addCentroid(ql, cidx, attFact, x, y, z ); 

} 

} 

} 

if (fl->aromCnt == -1 ) 
{ 

fl->aromCnt = 0; 
attFact = 1.0; 

for ( i = 0, strSet = str->aromSets; i < str->numArom; + , strSet++ ) 
{ 

for ( k = cidx = 0; cidx < 6 && k < fl->atomCnt; k++ ) 
{ 

if ( strSet-> atoms[ strMap[k] ] ) 
{ 

, . if (fl->AtWts ) 

attFact = fl->AtWts[k]; 
centAtoms[cidx] = k; 
cidx + + ; 
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} 

} 

if ( strSet->numAtoms = = cidx ) 
{ 

if ( !computeCentroid(fl-> cords, centAtoms, cidx, &x, &y, &z ) ) 
addCentroid(fl, cidx, attFact, x, y, z ); 

} 

} 

} 

/* compare the query aromatic rings verses the structure's aromatic rings */ 
for ( i = 0; i < ql->aromCnt; i++ ) 

{ 

best = 99999.0; 
found = 0; 

qryCords = ql->cent + (i*4); 

attFact = 1.0; 

worst = 20.0 * 20.0; 

for(j = 0;j < fl->aroraCnt; j++ ) 

{ 

strCords = fl->cent + (j*4); 

x = *qryCords - *strCords; 

y = *(qryCords+l) - *(strCords+l); 

z = *(qryCords+2) - *(strCords+2); 

distsq = x*x + y*y + z*z; 

if ( distsq < best ) 

{ 

found = j + 1; 
best = distsq; 

attract = *(qryCords+3) * *(strCords+3); 

} 

#ifdef DEBUG_DETAIL 

if ( q_debugfp ) 

fprintf(q_debugfp, "# arom centroid dist: %8.31f from: %d.%d\n", 
sqrt(distsq), ql->id+l, fl->id+l ); 

#endif 

} 

if (best > 0.25) 
{ 

if ( best > 3.0625 ) /* worst case distance is greater than 1 .75 perfect mismatch 
(see GOLD/GASP papers) */ 

featDiff = worst * attFact; 

else 

featDiff = worst * attFact * (( best - 0.25 ) / 2.8125 ); 
otherDiff += featDiff; 

} 

#ifdef DEBUG_DETAIL j 
if ( q_debugfp ) * 
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fprintf(qilebugfp ,"# arom centroid q:%d,%d s:%d best:%8.31f fieldDiff: 

%8.41f \n", 

ql->id+l, i, fl->id+l, 
best, fieldDiff); 

#endif 

} 


worst = featureWeightsfO]; 
worst *= worst; 

/* add in penalty for extra aromatic rings in the structure not in the query */ 
if ( fl->aromCnt > ql-> aromCnt) 

otherDiff += worst * 0.1 * (double) (fl->aromCnt - ql->aromCnt) ; 

#ifdef DEBUG_DETAIL 
if ( q_debugfp ) 

{ 

fprintf(q_debugfp, "# arom Counts: query : %d structure : %d %s\n", 
q 1 - > aromCnt, f 1 - > aromCnt, 

(ql->aromCnt && ql-> aromCnt = = 0 ) ? "Missing some rings" : "" ); 

} 

#endif 

return otherDiff * q_featureFactor; 

} 

static double compareFeatures(Split *qs, Frag *qry, Split *ss, Frag *str, int qry2ndAttach, int 

str2ndAttach ) 

{ 

int *qmask; 
int *smask; 
int i j,k; 

double best = 99999.0; 

int found = -1; 

double worst; 

int qid, sid; 

int *qMap, *strMap; 

FeatureType qfeature, strFeature; 

double x,y ? z; 

double distsq; 

double *qryCords, *strCords; 
double attFact; 
double fieldDiff = 0.0; 
double extraDiff = 0.0; 
int centAtoms[6]; 
int cidx; 

AromSet *qset, *strSet; 
int *covered; : 

static double featureContributions[4][MAX_FEATURES]; /* maximum of 200 features per type 
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should be more than enough, for the above 4 features */ 
static int featureCnt[4]; 
static int extraFeatureCnt[4]; 
int fidx; 


featureCnt[0] = featureCnt[l] = featureCnt[2] = featureCnt[3] = 0; 

extraFeatureCnt[0] = extraFeatureCnt[l] = extraFeatureCnt[2] = extraFeatureCnt[3] = 0 

qmask = qs->featureMask; 

smask = ss->featureMask; 

qMap = qry->origMapping; 

strMap = str->origMapping; 

if ( !qry-> cords J | !str-> cords ) 

{ 

rprintf(stderr,"no coords: %d %d\n\ qry-> cords, str-> cords); 
return 9999.0 * 9999.0; 

} 

covered = (int *) calloc(str->atomCnt,sizeof(int) ); 

#ifdef DEBUG_DETAIL 
if ( q_debugfp ) 

{ 

rprintf(q_debugfp, "\n# Feature comparison Query Id: %d Structure Id: %d\n", 
qiiy->id + 1, str->id + 1 ); 

} ; 

#endif 


/* do the single atom features first */ 
for ( i - 0; i < qry->atomCnt ; i + + ) 
{ 

if ( qmask[ qMap[i] ] = = FeatureNone ) 

continue; /* no single atom feature at this atom */ 


qfeature = qmask[qMap[i]]; 
for (k = 0; k < 4; k++ ) 

{ 

if ( !( qfeature & fMasks[k] ) ) 

continue; 
fidx = featureCnt[k]; 
best = 999999.0; 
found = -1; 

worst = (double) featureWeights[k+l] * featureWeights[k+l]; 
for ( j = 0; j < str->atomCnt; j + + ) 

{ ; 

if ( ( !( smask[ strMap[j] ] & fMasks[k] ) ) 
continue; 
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/* don't 


count attachment features in core mode */ 


= = str2ndAttach ) ) 


#ifdef DEBUG DETAIL 


if ( q_coremode && (strMap[j] == str- > copyBaseAtom j j strMap[j] 

continue; 
qryCords = qry-> cords + (i*3); 
strCords = str- > cords 4- (j*3); 
x = *qryCords - *strCords; 
y = *(qryCords+l) - *(strCords+l); 
z = *(qryCords+2) - *(strCords+2); 
distsq = x*x + y*y + z*z; 
if ( distsq < best ) 
{ 

best = distsq; 
found = j; 

} 


if ( q_debugfp ) 

iprintf(q_debugfp, "# feature compare: %d %d type:%d 
distance: %7.41f best: %7.41f from: %d. %d\n", 

1 i+1, j+1, k+1, sqrt(distsq), best, qry->id+l, 

str->id+l); 
#endif 

} 

if (found != -1 ) 

covered[found] j = fMasks[k]; 


a squared */ 


attFact =1.0; 
if ( best > 0.25 ) 

{ 


/* More than 0.5, this causes a penalty, best is 


if (qry->AtWts) 
{ 

if ( str- > AtWts && found ! = -1 ) 

attFact = (qry-> AtWts[i] + str-> AtWts[found] ) /2.0; 

else 

attFact = qry->AtWts[i]; 

} 

else if ( str-> AtWts && found != -1 ) 
attFact = str->AtWts[found]; 

if ( best > 3.0625 ) /* worst case distance is greater than 1.75 perfect 
mismatch (see GOLD/GASP papers) */ 

{ 

fieldDiff + = worst * attFact; 
featureContributions[k][fidx] = worst * attFact; 

} 

else 
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/ 2.8125); 


fieldDiff + = worst * attFact * (( best - 0.25 ) / 2.8125 ); 
featureContributions[k][fidx] = worst * attFact * ((best - 0.25 ) 


featureConttibutions[k][fidx] = 0.0; 


} 

else 

{ 
} 

if ( featureCntfk] < (MAX_FEATURES - 1) ) 

featureCnt[k] + = 1 ; /* just to avoid core dumps, don't increment if full 

*/ 

#ifdef DEBUG_DETAIL 

if ( q_debugfp ) 
{ 

fprintf(q_debugfp, 

"# feature q:%d s:%d ftype:%d best: %7.41f a:%5.31f 

worst:%11.21f FieldDiff: % 9. 31f\n", 

i+1, found, qmask[ qMap[i] ], best, attFact, sqrt(worst), 

fieldDiff); 

} 

#endif 

} 

} 

/* Now for the extra feature penalty, count all non-covered features */ 
for ( j = 0; j < str->atomCnt; j+ + ) 
{ 

if ( smaskf strMaplj] ] ! = FeatureNone ) 
{ 

if ( q_coremode && ( strMaplj] == str->copyBaseAtom | J strMap[j] = = 


str2ndAttach ) ) 


: 1.0); 


#ifdef DEBUG DETAIL 


continue; 
strFeature = smask[ strMaplj] ]; 
for ( k = 0; k < 4; k+ + ) 
{ 

if ( !( strFeature & fMasks[k] ) ) 

continue; 
if ( !( coveredQ] & fMasks[k] ) ) 
{ 

worst = featureWeights[k+ 1] * ( (str-> AtWts) ? str-> AtWts[j] 

fieldDiff + = (worst * worst * q_extraFeatureFactor ); 
extraDiff + = (worst * worst * q_extraFeatureFactor ); 
extraFeatureCntfk] +=1; 


if ( q_debugfp ) 
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fprintf(q_debugfb, "# missing feature %d,%d %d 

worst:%11.21f FieldDiff: %9.31f\n", 

str->id+l, j + 1, 

smask[ strMap[j] ], worst, fieldDiff ); 

5 #endif 

} 

} 

} 

} 

10 free((char *) covered ); 

/* Almost the end of the single atom features. If autoscaling is on for features, let's ignore the 
featureDiff calculated so far 
15 auto scaling for features is NOT based upon hev atom count. It's based upon the number of 

features by type. 

*/ 

if ( q_partialMatch ) 

{ 

20 = fieldDiff = featureScaling(featureCnt, extraFeatureCnt, (double *) featureContributions, 

y q_partialMatch ); 

8 fieldDiff + = extraDiff; 

I } 

25 ! t! /* end of single atom, now do the aromatic rings */ 

/* Find the 5 and 6 membered aromatic rings in the fragments, setup centroids for quick 
comparisons */ 

30'£ if ( qry-> aromCnt = = -1 ) 

S { 

m attFact = 1.0; 

J=^ qry->aromCnt = 0; 

ill for ( i = 0, qset = qs- > aromSets; i < qs->numArom; i+ + , qset++ ) 

35 r ~ { 

for ( k = cidx = 0; cidx < 6 && k < qry- > atomCnt; k + + ) 
{ 

if ( qset-> atoms[ qMapfk] ] ) 
{ 

40 if (qry->AtWts) 

attFact = qry->AtWts[k]; 

else 

attFact = 1.0; 
centAtoms[cidx] = k; 
45 cidx++; 

} 

} 

if ( qset- > numAtoms && qset- > numAtoms = = cidx ) 
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{ 

if ( !computeCentroid(qry-> cords, centAtoms, cidx, &x, &y, &z ) ) 
addCentroid(qry, cidx, attFact, x, y, z ); 

} 

} 

} 

if ( str-> aromCnt = = -1 ) 
{ 

str- > aromCnt = 0; 
attFact = 1.0; 

for ( i = 0, strSet = ss->aromSets; i < ss->numArom; i+ + , strSet+ + ) 

for ( k = cidx = 0; cidx < 6 && k < str->atomCnt; k++ ) 
{ 

if ( strSet->atoms[ strMap[k] ] ) 
{ 

if (str->AtWts ) 

attFact = str->AtWts[k]; 
centAtoms [cidx] = k; 
cidx++; 

} 

} 

if ( strSet- > numAtoms = = cidx ) 
{ 

if ( !computeCentroid(str-> cords, centAtoms, cidx, &x, &y, &z ) ) 
addCentroid(str, cidx, attFact, x, y, z ); 

} 

} 

} 

/* compare the query aromatic rings verses the structure's aromatic rings */ 

for ( i = 0; i < qry-> aromCnt; i+ + ) 

{ 

best = 99999.0; 
found = 0; 

qryCords = qry->cent + (i*4); 

attFact =1.0; 

worst = 20.0 * 20.0; 

for (j = 0; j < str- > aromCnt; j++ ) 

{ 

strCords = str- > cent + (j*4); 

x = *qryCords - *strCords; 

y = *(qryCords+l) - *(strCords+l); 

z = *(qryCords+2) - *(strCords+2); 

distsq = x*x + y*y + z*z; 

if ( distsq < best ) 

{ 

found = j + 1; 
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best = distsq; 

attFact = *(qryCords+3) * *(strCords+3); 

} 

#ifdef DEBUG_DETAIL 

if ( q_debugfp ) 

fprintf(q_debug$, "# arom centroid dist: %8.31f from: %d.%d \n", 
sqrt(distsq), qry->id+l, str->id+l ); 

#endif 

} 

if ( best > 0.25 ) 
{ 

if ( best > 3.0625 ) /* worst case distance is greater than 1 .75 perfect mismatch 
(see GOLD/GASP papers) */ 

fieldDiff += worst * attFact; 

else 

fieldDiff + = worst * attFact * (( best - 0.25 ) / 2.8125 ); 

} 

#ifdef DEBUGDETAIL 

if ( q_debugfp ) J 

fprintf(q_debugfp ,"# arom centroid q:%d,%d s:%d best:%8.31f fieldDiff: 

%8.41f\n", 

qry->id+l, i, str->id+l, 
best, fieldDiff); 

#endif 

} 

worst = 20.0 * 20.0; 

/* add in penalty for extra aromatic rings in the structure not in the query */ 
if ( str->aromCnt > qry->aromCnt ) 

fieldDiff += worst * 0.1 * (double) (str->aromCnt - qry->aromCnt) ; 

#ifdef DEBUGDETAIL 
if ( q_debuglp ) 
{ 

fprintf(q_debugfp, "# arom Counts: query : %d structure : %d %s\n", 
qry- > aromCnt, str- > aromCnt, 

(str->aromCnt && qry->aromCnt = = 0 ) ? "Missing some rings" : "" ); 

#endif 

return fieldDiff * q_featureFactor; 

} 

/* 

The data is in FeaturePos, FeatureNeg, FeatureHBA and FeatureHBD order 

*/ 

static double featureScaling(int *featureCnts, int *extraFeatureCnts, double *featureContributions, int 
nbest ) 
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static double *thebest; 
static int maxBest; 
double lowest, clowest; 
5 int cnt, lowidx; 

double dval; 
double fieldDiff = 0.0; 
double featDiff; 
double fieldlgnored = 0.0; 
10 double fieldFact; 

int k, idx, j, fidx; 

if ( Ithebest 1 1 nbest > maxBest ) 
{ 

15 if ( thebest ) 

free((char '*) thebest ); 
thebest = (double *) ma!loc(sizeof(double) * nbest ); 
maxBest = nbest; 

} 

20 _ 

y featDiff = 0.0; 

M for(k = 0;k < 4;k+ + ) 

S { 

: i; if ( featureCnts[k] = = 0) 

25 ^ continue; 

j~ /* Find the N lowest contributing features by type. 

m Think of this as partial match feature matching, like Unity's 

flexible searching. 

fi */ 
30 > for ( featDiff = 0.0, lowidx = -1, lowest = 999999999.0, cnt = idx = 0; idx < 

X featureCnts[k]; idx++ ) 

SI { 

p fidx = (k * MAX FEATURES) + idx; 

tl dval = featureContributions[fidx]; 

35 ; featDiff += dval; 

if ( dval < lowest j j cnt < nbest) 

{ 

if ( cnt < nbest ) 

40 if ( dval < lowest ) 

{ 

lowest = dval; 
lowidx = cnt; 

} 

45 thebest[cnt] = dval; 

cnt++; 

>; 

else 
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thebestpowidx] = dval; 
lowest = dval; 

for (j = 0; j < nbest; j + + ) 
if ( thebest[j] < lowest ) 

{ 

lowest = thebest[j]; 
lowidx = j; 

} 

} 


} 


/* we are looking at donors and acceptors */ 


if ( cnt > 0 ) 
{ 

if ( k > 1 ) 
{ 

fieldFact = 2.0 / (double) cnt; /* Mainly to increase the importance 
when only one donor or acceptor exists */ 

if ( fieldFact < 0.9 ) 

fieldFact = 0.9; 
for (j = 0;j < cnt; j++ ) 

{ 

fieldDiff + = thebestQ] * fieldFact; 
featDiff-= thebestQ]; 

}>. 


#if 0 
#else 
#endif 

#if0 

thebest[0], featDiff ); 
#endif 


fieldFact = (1.0 / ( (double) (cnt+2) * (double) (cnt+1) ) ); 

fieldFact = 0.0; 

if '( cnt > 2 ) 

fieldFact *= 0.5; 
fieldDiff += fieldFact * featDiff; 

fprintf(stderr, "field: %8.21fbest: %8.21f remain:%8.21f \n", fieldDiff, 


} 

else 

{ 
} 


fieldDiff + = featDiff; 


#if 0 


if ( featureCnts[k] > 2 ) 
{ 
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/* so now what do we do about the 5th - Nth fields. 
Should they or shouldn't they contribute */ 
for ( fieldlgnored = 0.0, j = cnt; j < featureCntsfk]; j + + ) 
{ 

fidx = ( k * MAXFEATURES ) + j; 
fieldlgnored + = featureContributions[fidx]; 

>, 

fprintf(stderr, "Field ignored total: %8.21f sqrt is: %8.21f\n", 

fieldlgnored, sqrt(fieldIgnored) ); 
fprintf(stderr,"type: %d cnt: %d ", k, featureCntsfk] ); 
for (j = 0; j < featureCnts[k]; j + + ) 
{ 

fidx = ( k * MAX FEATURES ) + j; 
fprintf(stderr,"%7.21f ", featureContributions[fidx] ); 

} 

fprintf(stderr,"\nBest %d: ", cnt); 
for (j = 0; j < cnt; j + + ) 

fprintf(stderr,"%7.21f ", thebest[j] ); 
fprintf(stderr,"\n"); 

} 

#endif 

} 

} 

return fieldDiff; 

} 

static int SearchForFeatures(Split *S) 

{ : 

int aromHit, featureHit; 

int numFeatures; 

FeaturePattem *fptr; 

int oxygen, nitrogen, sulfur; 

int ringoxygen, ring nitrogen, ring sulfur; 

int nonSingleRingBond; 

CtAtom *atom; 

CtBond *bond; 

int i,j, k; 

int bent; 

int strlnit; 

CtfiondTypeDef bondType; 
CtSimpleBondTypeDef simpleTypes; 
struct Srch2Hits *hits; 
int nhits, hitidx; 
int atomld; 
int *atoms; 

int nonSingleRingBonds; 
AromSet *aset; 
int alreadyFound; 
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char *regid; 

if (!S || !S->ct) 
return -1; 

5 

aromHit = featureHit = 0; 

oxygen = nitrogen = sulfur = nonSingleRingBond = 0; 
ringoxygen = ringnitrogen = ringsulfur = 0; 
regid = (char *) 0; 
10 DB_CT_GET_CT_ATTR(S- > ct, CtCtRegld, &regid ); 

fptr = InitFeaturePatterns(&numFeatures); /* it won't re-initialize */ 

15 DB_CT_UTL_FIND_RINGS(S- > ct); 

for ( i = 0, atom = S->ct->atoms; i < S->ct->atomCount; i++, atom+ + ) 

if ( atom- > class ! = CtAtomElement ) 
20 continue; 

if ( atom->id.atomicNumber == OXYGEN ) 

2 { 

oxygen-f^l-; 

if ( AB_IN_RING(atom) ) 
25 u 2 r ingoxygen + + ; 

t } 

5 else if ( atom- > id.atomicNumber = = NITROGEN ) 

L nitrogen 4* + ; 

30 "J if ( AB IN RING(atom) ) 

X. ring_nitrogen+ + ; 

m ) 

«; else if ( atom- > id.atomicNumber = = SULFUR ) 

S { 

35^ sulfur ++; 

if ( ABINRING(atom) ) 
ring_sulfur-f + ; 

} 

} 

40 for ( i = nonSingleRingBonds = 0, bond = S->ct-> bonds; 

i < S->ct->bonilCount&& nonSingleRingBonds ==0; 
i++, bond++ ) ; 

{ 

if ( AB_IN_RING(bond) ) 
45 { " " ' 

if (bond->simpleBondType == CtSimpleBondTypeNotSimple ) 

bondType = DB_CT_GET_BOND_TYPE(S- > ct, STDJD(i), &bcnt, 
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&simpleTypes ); 

if ( bondType != CtBondTypeS ingle ) 
• nonSingleRingBonds + + ; 

} 

else if ( bond->simpleBondType != CtSimpleBondTypeS ingle ) 
nonSingleRingBonds + + ; 

} 

} 

S->numArom = 0; 
S->aromSets = (AromSet *) 0; 

S->featureMask = (int *) calloc(sizeof(int), S- > atomCount); 
if ( nonSingleRingBonds ) 

S->aromMask = (int *) calloc(sizeof(int), S- > atomCount ); 

for ( i = strlnit = 0; i <<' numFeatures; i-h + , fytr+ 4- ) 
{ 

if (fptr-> weight = =0) 

continue; /* think of it as commented out */ 

if ( fptr->fjype s= = FeatureArom && nonSingleRingBonds = = 0 ) 

continue; /* Can't hit the feature aromatic, no non-single ring bonds */ 

if ( q_useFeatureCharges == 0 && ( fptr->f_type == FeaturePos j | fptr->f_type 
== = FeatureNeg ) ) ' 
continue; 
if (Q)tr->atomicId > 0) 
{ 

if ( fptr-> atomicld = = OXYGEN && ( oxygen = = 0 j j fptr- > ringlndicator 
= = 1 && ring oxygen = = 0 ) ) 

continue; 

if ( fptr- > atomicld == NITROGEN && ( nitrogen ==0 || 
fptr- > ringlndicator = = 1 && ring nitrogen = = 0 ) ) 

continue; 

if ( fptr- > atomicld == SULFUR && ( sulftir == 0 j j fptr- > ringlndicator 
= = 1 && ring sulfur = = 0 ) ) 

continue; 

} 

hits = DBJSRCH2_SEARCH_PATTERN( fptr- > pattern, S->ct, strlnit ); 

strlnit = 1; 

nhits = 0; 

if (hits) 

{ 

nhits = DB_SRCH2_GET_HIT_COUNT(hits); 
if ( !nhits ) 

DBSRCH2FREE JtHTS (hits) ; 

} 

if ( Inhits ) 


continue; j 
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type */ 


0); 


#ifdef DEBUGDETAIL 

rule:%d\n", 
#endif 


atoms = (int *) 0; 

/* get the atoms which matched and store accordingly, depending upon the feature 

if ( fytr- > fjype = = FeatureArom ) 
{ 

for ( hitidx = 0; hitidx < nhits; hitidx + + ) 
{ 

atoms = (int *) calloc(S->ct->atomCount, sizeof(int) ); 

/* store the atoms which define the centroid */ 
for ( j = 1; j < = fptr->ct->atomCount; j+ + ) 

{ 

atomld = DB_SRCH2_GET_ATOM_MAPPING(j, hits, hitidx, 

if ( latomld) 
{ 

UTL_ERROR_CLEAR0; 
continue; 

} 

; atomld--; 


if ( q_debugfp ) 

fprintf(q_debugfp, "# feature %s atom:%d ftype:%d 

regid, atomld+l, (int) fptr->f_type, i+1 ); 

S->aromMask[atomId] = ^ptr- > weight; 
atoms [atomld] = fptr-> weight; 


S->aromSets = (AromSet *) DB_CT_UTL_REC ALLOC ((char *) 
S->aromSets, S->numArom * sizeof(AromSet), 

(S->numArom+l) * sizeof (AromSet) ); 
aset = S->aromSets + S->numArom; 
S->numArom+-h; 


} 

else 

{ 


aset- > atoms = atoms; 

aset->numAtoms = fptr->ct->atomCount; 


for ( hitidx = 0; hitidx < nhits; hitidx + + ) 
{ 

atomld = DB_SRCH2_GET_ATOM_MAPPING(l , hits, hitidx, 0 ); 
if < latomld ) 

{ 

UTL_ERROR_CLEAR0; 
DB_SRCH2_FREE_HITS(hits); 
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#ifdef DEBUG DETAIL 


#endif 


continue; 

atomld--; /* make it base 0 */ 
if ( qjiebugfp ) 

fprintf(q_debugfp, "# feature %satom:%d ftype:%d rule:%d\n*\ 
regid, atomld+l, (int) fptr- > fjype, i+1 ); 

S->featureMask[atomId] |= fptr- > ftype; 


} 
} 

DB_SRCH2_FREE HITS(hits); 


> 

return 0; 

} 


static FeaturePattern *InitFeaturePatterns(int *r_numPatterns) 

{ 

static Srch2Control sctrl[l]; 

static int numPatterns; 

struct CtConnectionTable *ct; 

FeaturePattern *fptr; 

FeaturePattern *Q>ats; 

static FeatureSetName currentSet; 

static FeaturePattern UnityfpatsQ = { 

{ FeatureArom, 20, 0, 0, H Hev[l]:Hev:Hev:Hev:Hev:Hev:@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l] = [r]Hev-[r]Hev=[r]Hev-[r]Hev=[r]Hev-[r]@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l]:Hev:Hev:Hev:Hev:@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l] = [r]Hev-[r]Hev=[r]Hev-[r]Hev-[r]@l*' }, 

{ FeatureArom, 20, 0, 0, "Hev[l]:[r]Hev-[r]Hev=[r]Hev-[r]Hev-[r]@l" }, 

{ FeaturePos, 200, 0, 0, "Any[+;not=Any*~ Any[-]]" }, 

{ FeaturePos, 200, NITROGEN, 0, "N[not=N*Hev:=#Any,N*0](Any)(Any)Any" }, 
{ FeaturePos, 20C, NITROGEN, 0, "N[not=N*-Any[-]](Any)(Any)(Any)Any" }, 
{ FeaturePos, 200, NITROGEN, 1, 

"N[l:NOT=N*~Any[-l]](:Hev:Hev:Hev:Hev:Hev:@l)Any[not=0[f|-N]" }, 

{ FeaturePos, 200, NITROGEN, 0, 

"N[not=N* ~ Any[-],N(=0) ~ 0[f]](= Any)( ~ Any) ~ Any " }, 

{ FeaturePos, 200, NITROGEN, 0, "N[f;not=N*Hev:=#Any](Any)Any" }, 

{ FeaturePos, 200, NITROGEN, 0, 

n N[F](Hc)(Hc)C(=N[F]Hc)Any[IS=C*,N*[fl(Any[is=H,C])(Any[is=H,C])(Any[is=H,C])]{Hc:H,C 
[NOT=C*=#Any]}" }, 

{ FeaturePos, 200, NITROGEN, 0, 
"C[l:F](:N[F]:C(:C(:N:@lHc)Ariy)Any)Any{Hc:H | C[NOT=C*=:#Any]}" }, 

{ FeaturePos, 200- NITROGEN, 1, "N[l:q(C):C:N[f](C):C:C:@l" }, 


I 
i 
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{ FeatiireNeg, 200, OXYGEN, 0, 

"0[is=0*H,0*[fJHev]-:Hev[is=C*=:0,S*(=:0)(=:0)]" }, 

{ FeatiireNeg, 200, OXYGEN, 0, 

"0[is=0*H,0*[f|Hev]P(=0)(0[is=0*H,0*[flHev])OHev ,, }, 

{ FeatureNeg, 200, OXYGEN, 0, n O[is=0*H,0*[f]Hev]P(=0)(OHev)OHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, 

"0[is=0*H,0*[f]Hev]P(=0)(0[is=0*H,0*[f]Hev])CHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, "O^O^OnflHevlP^OXOHevJCHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, "0[is=0*H]P[f|(=0)C" }, 

{ FeatwreNeg, 200, NITROGEN, 1, 

M Any[is=C[l]:NH:N:N:N:@l > C(l]:N:NH:N:N:@l,C[l]:N:N:NH:N:@l,C[l]:N:N:N:NH:@l] M }, 

{ FeatureHBA, 100, OXYGEN, 0, "0[fl=Any[not=S,P,N(=0[f])~0[fl](Any)Any" }, 

{ FeatureHBA, 100, OXYGEN, 0, "0[f]~ Any[is=S,P](Any[not=0])Any[not=OJ" }, 

{ FeatureHBA, 100, NITROGEN, 0, "N[f](:Any):Any" }, 

{ FeatureHBA, 100, NITROGEN, 1, ,, N[l]H:N[f|:Z:Z:Z:@l{Z:C,N} , • }, 

{ FeatureHBA, 100, NITROGEN, 1, "N[l]H:C:N[f]:Any:Any:@r }, 

{ FeatureHBA, 100, OXYGEN, 0, "0[f]C:Any" }, 

{ FeatureHBA, 100, OXYGEN, 0, "0[f]HC[not=C=Any]-:Any" }, 

{ FeatureHBA, 100, OXYGEN, 0, "0[fJ(Z)Z{Z:C[not=C=Any]}" }, 

{FeatureHBA, 100, OXYGEN, 1, "0[l:f]-:Z=:Z-:Z=:Z-:@l{Z:Any[is=C,N]}" }, 

{ FeatureHBA, 100, OXYGEN, 1, 

"0[not=0[l]Any[is=C,N]=Anyfis=C,N]Any[is=C,N]=Any[is=C,N]@l](Any)C=Any[is=C,N]"}, 
{ FeatufreHBA, 100, NITROGEN, 0, 

"N[f]H(Z)C[not=C=Het;is=C:Any,CHevN[f](Zz)Zz]{Z:C[not=C=Het] | N[not=NC=Het] | O[not= 

OC = O] | S(= O) = O | H} {Zz:H | N : [ O | C[not= C: = Hev]} " } , 

{ FeatuheHBA, 100, NITROGEN, 0, 

tt N[fl(Z)(Z)C[not=C=Het;is=C:Any,CHevN[f|(Zz)Zz]{Z:C[not=C=Het] j N[not=NC=Het] 1 0[not 

=OC=0] | S(=0)=0}{Zz:H | N |0| C[not=C: =Hev]}" }, 

{ FeatureHBA, lqo, OXYGEN, 0, "0[fI(Any[is=H,C])C=0" }, 

{ FeatureHBA, 1Q0, OXYGEN, 0, "0[fl-:C~0[f]" }, 

{ FeatureHBA, ldo, NITROGEN, 0, "NH=C[not=CN]" }, 

{ FeatureHBA, 1Q0, NITROGEN, 0, M N[f](~Hev)=C[not=CN]" }, 

{ FeatureHBA, 100, NITROGEN, 0, 

"N[fI(=C[is=NC*N,NC*C,NC*H])Hev[is=Hev=0,Hev=S,C#N,CN(~0[fl)~0[fl]" }, 
{ FeatureHBA, 100, OXYGEN, 0, "0[fJ~N(Any)~0[fT }, 
{ FeatureHBA, 1Q0, OXYGEN, 0, "0~Any[is=S,P](~0)~0 " }, 

{ FeatuireHBD, 100, NITROGEN, 0, 
"N[not=C[l]:N*:N:N:N:@l,C[lj:N:N*:N:N:@l]H~[!type=3]Any" }, 

{ FeatureHBD, l60, OXYGEN, 0, "OHAny[not=C=0]" }, 

{ Feat ujr eHBD, 100, NITROGEN, 0, 
"N[f|(Hev[not=Any=0,Any=S,S#N,N(~0[fl)~0[fl])=C ,, }, 

{ Feat u*r eHBD, 100, NITROGEN, 0, 
"N[f](:C[l:not=COH,CSH]):C[not=COH,CSH]:C:C[not=COH,CSH]:C:@l" }, 

{ FeatureHBD, 100, NITROGEN, 1, 
"N[l:f;not=C[l]:N*:N:N:N:@l,(:[l]:N:N*:N:N:@l]:Any:Any:N(Any):Any:@l" }, 

{ F e a t ujr e H B D , 100, NITROGEN, 1 , 

] 249 


"N[l:f;not=C[l]:N*:N:N:N:@l,C[l]:N:N*:N:N:@l]:Any[l:not=N]:Any[is=C,N]:Any[is=C,N]:N 
H:@l" }, 

{ FeatureHBD, 100, NITROGEN, 0, "N[f](:C(Any[is=0,S]H)):Any:Any:Any" }, 
{ FeatureHBD, 100, NITROGEN, 0, "N[f|(:C:C:C(Any[is=0,S]H)):Any" }, 
5 { FeatureHBD, 100, NITROGEN, 0, 

"N[f](Ya)(Ya)Ya{Ya:Any[not=H,C=0,C=N,S(=0)(=0)Any]}" }, 

{ FeatureHBD, 100, OXYGEN, 0, "0[f]~Any[is=S,P](~OH)(~0)" }, 
{ FeatureHBD, 100, SULFUR, 0, 
"S[f]HZ{Z:C[not=C=0]|S[not=S~0]|N[not=N~0] }" }, 
10 { FeatureNone, -1, 0, 0, (char *) 0 } 

}; 

static FeaturePattern Unityfpats_WeLike[] = { 
15 { FeatureArom, 20, 0, 0, "Hev[l]:Hev:Hev:Hev:Hev:Hev:@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l] = [r]Hev-[r]Hev=[r]Hev-[r]Hev=[r]Hev-[r]@l" }, 
{ FeatureArom, 20, 0, 0, "Hev[l]:Hev:Hev:Hev:Hev:@l" }, 
{ FeatureArom, 20, 0, 0, "Hev[l]=[r]Hev-[r]Hev=[r]Hev-[r]Hev-[r]@l" }, 
{ FeatureArom, 20, 0, 0, "Hev[l]:[r]Hev-[r]Hev = [r]Hev-[r]Hev-[r]@l" }, 
20 { FeaturePos, 200, 0, 0, "Any[+;not=Any*~ Any[-]]" }, 

O { FeaturePos, 200, NITROGEN, 0, "N[not=N*Hev: =#Any,N*0](Any)(Any)Any" }, 

* { FeaturePos, 200, NITROGEN, 0, "N[not=N*~ Any[-]](Any)(Any)(Any)Any" }, 

S { FeaturePos, 200, NITROGEN, 1, 

"N[l:NOT = N*~Any[-l]](:Hev:Hev:Hev:Hev:Hev:@l)Any[not=0[f]-N]" }, 
25 y J { FeaturePos, 200, NITROGEN, 0, 

t "N[not=N*~Any[-],N(-0)~0[fl](=Any)(~Any)~Any" }, 

Z. { FeaturePos, 200, NITROGEN, 0, "N[f;not=N*Hev:=#Any](Any)Any" }, 

m { FeaturePos, 200, NITROGEN, 0, 

h "N[F](Hc)(Hc)C( = N[F]Hc)Any[IS=C*,N*[f](Any[is=H,C])(Any[is=H,C])(Any[is=H,C])]{Hc:H,C 

30 p [NOT=C*=#Any]}" }, 

X { FeaturePos, 200, NITROGEN, 0, 

^ "C[l:F](:N[F]:C(:C(:N:@lHc)Any)Any)Any{Hc:H|C[NOT=C* = :#Any]}" }, 
"A { FeaturePos, 200, NITROGEN, 1, "N[l:f](C):C:N[f](C):C:C:@l" }, 

LI { FeatureNeg, 200, OXYGEN, 0, 

35 = "0[is=0*H,0*[f|Hev]-:Hev[is=C* = :0,S*(=:0)(=:0)]" }, 

{ FeatureNeg, 200, OXYGEN, 0, 
"0[is=0 :t! H,0*[flHev]P(=0)(0[is=0*H,0*[flHev])OHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, "0[is=0*H,0 :,: [f]Hev]P(=0)(OHev)OHev" }, 
{ FeatureNeg, 200, OXYGEN, 0, 
40 "0[is=0*H,0*[f]Hev]P(=0)(0[is=0*H,0*[f]Hev])CHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, "0[is=0*H,0*[f]Hev]P(=0)(OHev)CHev" }, 
{ FeatureNeg, 200, OXYGEN, 0, "0[is=0*H]Pffl(=0)C" }, 

{ FeatureNeg, 200, NITROGEN, 1, 
"Any[is=C[l]:NH:N:N:N:@l,C[l]:N:NH:N:N:@l,C[l]:N:N:NH:N:@l,C[l]:N:N:N:NH:@l]" }, 

45 

{ FeatureHBA, 100, OXYGEN, 0, "0[f]=Any[not=S,P,N(=0[f])~0[f]](Any)Any" }, 
{ FeatureHBA, 100, OXYGEN, 0, "0[fj~ Any[is=S,P](Any[not=0])Any[not=0]" }, 
{ FeatureHBA, 100, NITROGEN, 0, "N[f](:Any):Any" }, 
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{ FeatureHBA, 100, NITROGEN, 1, "N[l]H:N[f]:Z:Z:Z:@l{Z:C,N}" }, 
{ FeatureHBA, 100, NITROGEN, 1, "N[l]H:C:N[f|:Any:Any:@l" }, 
{ FeatureHBA, 100, OXYGEN, 0, "0[f]C:Any" }, 
{ FeatureHBA, 100, OXYGEN, 0, "0[f]HC[not=C=Any]-:Any" }, 
5 { FeatureHBA, 100, OXYGEN, 0, "0[f](Z)Z{Z:C[not=C=Any]}" }, 

{ FeatureHBA, 100, OXYGEN, 1, "0[l:f]-:Z = :Z-:Z = :Z-:@l{Z:Any[is = C,N]}" }, 
{ FeatureHBA, 100, OXYGEN, 1, 

"0[not = 0[l]Any[is = C,N] = Anyps = C,N]Any[is = C,N] = Any[is =C,N]@ l](Any)C = Any [is = C,N] "} , 
{ FeatureHBA, 0, NITROGEN, 0, 

10 "N[f]H(Z)C[not = C=Het;is=C:Any,CHevN[f](Zz)Zz]{Z:C[not=C=Het] |N[not=NC=Het] jO[not= 

OC=0]!S(=0)-0|H}{Zz:HjN|0|C[not=C:=Hev]}" }, 

{ FeatureHBA, 0, NITROGEN, 0, 

"N[f](Z)(Z)C[not=C=Het;is = C:Any,CHevN[fl(Zz)Zz]{Z:C[not = C=Het] | N[not=NC=Het] |0[not 

=OC=0]|S(=0)=0}{Zz:H|N|0|C[not=C:=Hev]}" }, 
15 { FeatureHBA, 0, OXYGEN, 0, "0[f](Any[is=H,C])C=0" }, 

{ FeatureHBA, 100, OXYGEN, 0, "0[f]-:C~0[fI" }, 
{ FeatureHBA, 100, NITROGEN, 0, n NH=C[not=CN]" }, 
{ FeatureHBA, 100, NITROGEN, 0, "N[f](~Hev)=C[not=CN]" }, 



{ FeatureHBA, 100, NITROG 

E 

N , 

0 , 

20_ 

"N[f](=C[is=NC*N,NC*C,NC*H])Hev[is=Hev=0,Hev=S,C#N,CN(~0[f])~0[f]] 

" }, 



{ FeatureHBA, 100, OXYGEN, 0, "0[f]~N(Any)~0[f]" }, 





{ FeatureHBA, 100, OXYGEN, 0, "0~ Any[is=S,P](~0)~0 " }, 





{ FeatureHBD, 100, NITROG 

E 

N , 

0 , 

2 5 yi 

"N[not=C[l]:N*:N:N:N:@l,C[l]:N:N*:N:N:@l]H~[!type=3]Any" }, 





{ FeatureHBD, 100, OXYGEN, 0, "OHAny[not=C=0]" }, 





{ FeatureHBD, 100, NITROG 

E 

N , 

o , 


"N[f](Hev[not = Any = O, Any = S ,C#N , N( ~ 0[f]) ~ 0[f]]) = C" } , 





{ FeatureHBD, 0, NITROG 

E 

N , 

o , 

30jg 

"N[f](:C[l:not=COH,CSH]):C[not=COH,CSH]:C:C[not=COH,CSH]:C:@l" }, 





{ FeatureHBD, 0, NITROG 

E 

N , 

1 , 


"N[l:f;not=C[l]:N*:N:N:N:@l,C[l]:N:N*:N:N:@l]:Any:Any:N(Any):Any:@l" }, 





{ FeatureHBD, 0, NITROG 

E 

N , 

1 , 


[I "N[l:f;not=C[l]:N :|! :N:N:N:@l,C[l]:N:N*:N:N:@l]:Any[l:not=N]:Any[is-C,N]:Any[is=C,N]:N 
35 H:@l" }, 

{ FeatureHBD, 0, NITROGEN, 0, "N[f](:C(Any[is=0,S]H)):Any:Any:Any" }, 

{ FeatureHBD, 0, NITROGEN, 0, "N[f](:C:C:C(Any[is=0,S]H)):Any" }, 

{ FeatureHBD, 0, NITROGEN, 0, 

"N[f](Ya)(Ya)Ya{Ya:Any[not=H,C=0,C = N,S(=0)(=0)Any]}" }, 
40 { FeatureHBD, 100, OXYGEN, 0, "0[f]~Any[is=S,P](~OH)(~0)" }, 

{ FeatureHBD, 100, SULFUR, 0, 

"S[f]HZ{Z:C[not=C=0]|S[not=S~0]|N[not=N~0] }" }, 
{ FeatureNone, -1, 0, 0, (char *) 0 } 

}; 

45 

/* 

From Sybyl 6.71/Unity 4.21 $TA_3DB/sln3d_macros.def 

The structure above assumes first atom is the important atom, so right the sin the correct way the first 
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time. 

define:: Donor_Atom[name; target; rules; connection] 

sln=N[not=C[l]:N*:N:N:N:@l,Ql]:N:N*:N:N:@l]H-[!type=3]Any; 
features = 1 ; 

sin = OHAny [not = C = O] ; 
features = 1 ; 

sln=N[f](Hev[not=Any-0,Any=S,C#N ? N(^0[fl)^0[fl])=C; 
features = 1; 

sln=C[l:not=COH,CSH]:N[fl:C[not = COH,CSH]:C:C[not=COH,CSH]:C:@l ; 
features =2; 

sln=Any[l]:N(Any):Any:N[f;not-C[l]:N*:N:N:N:@l ) C[l]:N:N*:N:N:@l]:Any:@l; 
features =5; 

sln=Any[l:not = N]:Any[is = C,^ 
@1]:@1; 

features = 6; 

sln=Any:Any:Any:N[f|:C(Any[is = 0,S]H); 
features =4; 

sln=Any:N[f]:C:C:C(Any[is=0,S]H); 
features =2; 

sln-N[fI(Ya)(Ya)Ya{Ya:Any[not=H ) C = 0 ) C-N J S(=0)(=0)Any]} ; 
features = 1 ; 

sln=0[q ~ Any [is - S,P]( ~ OH)( ~ O); 
features = 1 ; 

sin = S [f]HZ{Z : C[not = C = 0] | S [not =S-0]| N[not = N ~ O] } ; 
features = 1 ; 
features = : :name: :_DL_1 , 
end_define 

define:: Acceptor_Atom[name; target; rules; connection] 


sin = 0[f] = Any [not = S,P,N( =Offl) ~ 0[f]](Any)Any ; 
features = 1 ; 

sin = 0[f] ~ Any[is = S ,P](Any [not = 0])Any [not - 0] ; 
features = 1 ; 
sln=Any:N[f]:Any; 
features =2; 

sln=Z[l]:Z:Z:NH:N[fl:@l{Z:C!N}; 
features™ 4; 

sln=Any[l]:NH:C:N[f]:Any:@l ; 
features =2; 
sln=0[f|C:Any; 
features = 1; 

sin = 0[f]HC[not = C = Any]-: Any ; 
features = 1; 
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sln= ZO[f|Z{Z :C[not = C = Any]} ; 
features =2; 

sln=Z[l]-:0[f]-:Z = :Z-:Z=:@l{Z:Any[is=C,N]}; 
features =2; 

sln=0[not=0[l]Any[is=C,N]=Any[is=C,N]Any[is=C,N]=Any 
features = 1; 

sln=N[fJH(Z)C[not=C=Het;is=C:Any,CHevN[f](Zz)Zz]{Z:C[not=C=Het]|N[not=NC=Het]|0[n 
ot=OC=0]jS(=0) = 0|H}{Zz:H|N|0|C[not=C:=Hev]}; 
features = 1; 

sln-N[f](Z)(Z)C[not-C=Het;is=C:Any,CHevN[f](Zz)Zz]{Z:C[not=C=Het] | N[not=NC=Het] |0[ 
not = OC =0] | S( = O) =0}{Zz:H jNjOj C[not =C: =Hev]} ; 
features = 1; 

sln=0[f](Any[is=H,C])C=0; 
features = 1; 
sln=0[f]-:C~0[f]; 
features = 1; 
sln=NH=C[not=CN]; 
features = 1; 

sin = Hev ~ N[f] = C[not = CN] ; 
features =2; 

sln=Hev[is =Hev =0,Hev = S,C#N,CN( ~ 0[fj) ~ 0[fJ]N[fJ =C[is = NC*N,NC*C,NC*H]; 
features =2; 

sln=AnyN(~0[f])~0[f]; 
features =3; 

sln=0~Any[is=S,P](~0)~0; 
features = 1; 
features= ::name::_AL_l, 
end_define 

*/ 

static FeaturePattern orig_top_fpats[] = { 

{ FeatureArom, 20, 0, 0, "Hev[l]:Hev:Hev:Hev:Hev:Hev:@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l] = [r]Hev-[r]Hev=[r]Hev-[r]Hev=[r]Hev-[r]@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l]:Hev:Hev:Hev:Hev:@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l]=[r]Hev-[r]Hev=[r]Hev-[r]Hev-[r]@l" }, 

{ FeatureArom, 20, 0, 0, "Hev[l]:[r]Hev-[r]Hev = [r]Hev-[r]Hev-[r]@l" }, 

{ FeaturePos, 200, 0, 0, "Any[+;not=Any* ~Any[-]]" }, 

{ FeaturePos, 200, NITROGEN, 0, "N[not=N*Hev:=#Any,N*0](Any)(Any)Any" }, 
{ FeaturePos, 200, NITROGEN, 0, "N[not=N*~Any[-]](Any)(Any)(Any)Any" }, 
{ FeaturePos, 200, NITROGEN, 0, 

"N[l:NOT=N*~Any[-l]](:Hev:Hev:Hev:Hev:Hev:@l)Any[not=0[fJ-N]" }, 

{ FeaturePos, 200, NITROGEN, 0, 

"N[not = N* ~ Any[-] ,N( = O) ~ 0[f]]( = Any)( ~ Any) ~ Any " } , 

{ FeaturePos, 200, NITROGEN, 0, "N[f;not=N*Hev:=#Any](Any)Any" }, 
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{ FeaturePos, 200, NITROGEN, 0, 

"N[F](Hc)(Hc)C(=N[F]Hc)Any[IS = C*,N*[f](Any[is=H,C])(Any[is-H,C])(Any[is-H,C])]{Hc:H,C 

[NOT=C*=#Any]}" }, 

{ FeaturePos, 200, NITROGEN, 1, 

"C[l :F](:N[F]:C(:C(:N:@lHc)Any)Any)Any{Hc:H j C[NOT=C* = :#Any]}" } , 

{ FeaturePos, 200, NITROGEN, 1, "N[l:f](C):C:N[f|(C):C:C:@l" }, 

{ FeatureNeg, 200, OXYGEN, 0, 

"0[is =0*H,0*[f]Hev]-:Hev[is =C* = :0,S*( = :0)( = :0)] " }, 

{ FeatureNeg, 200, OXYGEN, 0, 

"0[is = 0*H ,0* [f]Hev]P( = 0)(0[is = 0*H,0* [f]Hev])OHev" } , 

{ FeatureNeg, 200, OXYGEN, 0, "0[is=0*H,0*[f]Hev]P( = 0)(OHev)OHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, 

"0[is=0*H,0*[f]Hev]P(=0)(0[is=0*H,0*[f]Hev])CHev" }, 

{ FeatureNeg, 200, OXYGEN, 0, "0[is=0*H,0*[f]Hev]P(=0)(OHev)CHev" }, 
{ FeatureNeg, 200, OXYGEN, 0, "0[is=0*H]P[f](=0)C" }, 

{ FeatureNeg, 200, NITROGEN, 1, 
"Any[is=C[l]:NH:N:N:N:@l,C[l]:N:NH:N:N:@l,C[l]:N:N:NH:N:@l,C[l]:N:N:N:NH:@l]" }, 

{ FeatureHBA, 100, OXYGEN, 0, 
"0[is=0* = Any,0(Any)Any,0[qAny,0[fl(H)C=0,0[f]C=0;not=0* = :-N,0*[!r](Hev)Any=Het]"}, 

{ FeatureHBA, 100, OXYGEN, 0, "0[is=0* = NO,0*N=0,0*=N=0,0:N:0]" }, 

{ FeatureHBA, 100, NITROGEN, 0, 
"N[is=N*(Any)(Any)Any,N*(Any)Any,N^ 

N[f]:C:C:@l,N*[l:f]H:N[f]:C:C:C:@l,N*[l:f]H:N[f]:N[i]:C:C:@l,Nni:f]H:N[f]:C:N:C^ 
f]H;N[f]:N[f]:N[f]:C:@l,N*H=C,N*[f](Any)=C;not=N*(Any)(Ariy)Any[not=S]:=#0,N*C(=S)N, 
N*(Any)(Any)C( = S)C,N*(Any)(Any)(Any)Any,N*(Any)(Any)C:Hev,N :i =[f]HC:Hev,N*(Any[is=H,C 
])=C(N(Any[is=H,C])(C))(N(Any[is=H,C])(Any[is=H,C])),N(Any[is=H,C])=C(N*(Any[is=H,C]) 
(C))(N(Any[is=H,C])(Any[is=H,C])),N(Any[is=:H,C])=C(N(Any[is=H,C])(C))(N*(Any[is=H,C])( 
Any[is = H,C])),N*(:Hev)(:Hev) :-Hev,N*( =0)0] " } , 

{ FeatureHBA, 100, NITROGEN, 1, "N[1]C[2]:N:C:N:C(:@2)C(=0)NHC=@1" }, 
{ FeatureHBA, 100, NITROGEN, 0, "N[is=N*=N=N,N*(=N)=N]" }, 
{ FeatureHBA, 100, NITROGEN, 0, "N[is=N*(C)=NC]" }, 

{ FeatureHBA, 100, NITROGEN, 0, 
"N[is=N*(=C)N,N*[not=N*C=Het,N*C:Hev]N=C]" }, 

{ FeatureHBA, 100, SULFUR, 0, 
"S[is =S*[f]HAny,S*[fJ(Hev)Hev,S* = C(N)(N);not = S*Any ~ O] " } , 

{ FeatureHBD, 100, OXYGEN, 0, "OHAny[not=C=0,P,S]" }, 

{ FeatureHBD, 100, NITROGEN, 0, "NH" }, 

{ FeatureHBD, 100, SULFUR, 0, "SH" }, 

{ FeatureHBD, 100, NITROGEN, 1, 
"N[is = N*[l] = CNHC = C@l,N[l]:C:NH:C:C:@l,N*[l:f]:N[f]H:C:C:C:@l,N*[l:f]:N[f]H:C:C:N[q 
:@l,N*[l:f]:N[f]:N[f]H:C:C:@l,Nni:f]:C:C:N[f]:N[f]H:@l,Nni:f]:N[i]H:C:N[f]:C:@l,^^^ 
:N[f]:N[f]H:C:@l,Nni:fl:N[f]H:C:N[f]:C:@l,Nll:fl:C:N[f]H:N:C:@l,N*[l:f]:C:N[f]H:C:^ 
}, 

{ FeatureHBD, 100, NITROGEN, 0, 
"N[not=N*Hev=#:Het,N*0,N*Hev:Hev](Hev)(Hev)Hev" }, 
{ FeatureNone, -1, 0, 0, (char *) 0 } 

}; 
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if ( numPatterns && currentSet = = q_featureSet ) 
{ 

*r_numPatterns = numPatterns; 

if ( q_featureSet = = UseUnityFeatures ) 

return Unityfpats; 
else if ( q_featureSet = = UseTopomerFeatures ) 

return orig__top__fpats; 

else 

return Unity fpats_WeLike; 

} 

if ( q_featureSet = = UseUnityFeatures ) 

fpats = Unityfpats; 
else if ( q_featureSet = = UseTopomerFeatures ) 

fpats = orig_top_Jpats; 

else 

fpats = Unityfpats_WeLike; 

currentSet = qJfeatureSet; 

memset((char *) sctrl, '\0\ sizeof(Srch2Control) ); 
sctrl->maxHits = 0; 

sctrl- > searchControl = Srch2NoDuplicates; 
sctrl- > charge = 1; 
sctrl- > isotope = 1; 
sctrl- > stereoSearch = 1; 

for ( numPatterns = 0, fptr = fpats; fptr- > sin != (char *) 0; fjptr+H-, numPatterns + + ) 
{ 

if ( !fptr->ct ) 

fptr->ct = DB_IMPORT_SLN(fptr->sln); 
if ( !fptr->ct ) 

{ 

UTLERRORCLE AR() ; 

fprintf(stderr, "Problems importing the feature pattern\n%s\n\ Iptr- > sin ); 
continue; 

} 

if ( !fptr- > pattern && ! DB_SRCH2_OPEN_PATTERN(fptr- > ct, sctrl, &(fptr- > pattern) 

)) 

{ 

UTL_ERROR_CLEAR(); 
DB_CT_DELETE_CT(fptr- > ct); 
fptr- > pattern = (void *) 0; 

fprintf(stderr, "Problems building search pattern for the feature pattern\n%s\n", 

fptr- > sin ); 

continue; 

} 

} 
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35 


} 


*r_numPatterns = numPatterns; 

if ( q_featureSet = = UseUnity Features ) 

return Unity fpats; 
else if ( q_featureSet = = UseTopomerFeatures ) 

return orig_topJpats; 

else 

return Unityfpats_WeLike; 


10 static int computeCentroid( double *cords, int *atoms, int numAtoms, double *r_x, double *r_y, double 
*r_z ) 

{ 

double x, y, z; 
double *cptr; 
15 int i; 

double divfact; 


if ( Icords | | latoms | | numAtoms < = 0 j j !r_x | j !r_y | j !r_z ) 
return -1; 


y divfact = (double) numAtoms; 

5 x = y = z = 0.0; 

^ for ( i = 0; i < numAtoms; i++ ) 

!I ( 

25"L! cptr = cords + (atoms [i] * 3); 

% x + = *cptr; 

£ y + = *(cptr+l); 

^ z += *(cptr+2); 

h } 

*r_x = x / divfact; 

E *r_y = y / divfact; 

m * r „ z — z / divfact; 

J=i return 0; 


} 


static void addCentroid(Frag *fptr, int natoras, double attFact, double x, double y, double z ) 
{ 

double *cptr; 
double cdiff, xd, yd, zd; 
40 int i; 

int duplicate; 

for ( i = duplicate = 0; Iduplicate && i < fptr->aromCnt; i++ ) 
{ 

45 cptr = fjptr->cent + (i*4); 

xd = x - *cptr; 

yd = y - *(cptr+l); 

zd = z - *(cptr+2); 
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cdiff = xd*xd + yd*yd + zd*zd; 
if (cdiff < 0.1 ) 

duplicate = 1; 

} 

if ( duplicate ) 

{ 

return; 

} 

fptr- > cent = (double *) DB_CT_UTL_RECALLOC((char *) fptr- > cent, 

fptr- > aromCnt * sizeof(double) * 4, 

(fptr- > aromCnt +1) * sizeof(double) * 4 ); 
cptr = fptr- > cent -h (fptr- > aromCnt * 4); 
fptr- > aromCnt + + ; 

*cptr = x; 
*(cptr+l) = y; 
*(cptr+2) = z; 
*(cptr + 3) = attFact; 
return; 

} 


static int compareFields(double *orig, double *atombased, int npoints ) 
{ 

int i; 

for ( i = 0; i < npoints; i+ + , orig+ + , atombased++ ) 

{ 

if ( ( fabs( *orig - *atombased) ) > 0.1 ) 
{ 

fprintf(stderr, "field difference: %d of %6.31f %6.21f %6.21f\n", 
i, *orig - *atombased, *orig, *atombased ); 

} 

} 

return i; 

} 


/* functions from here to "end of core funcs" are for core searching */ 

int TOP_CORE_QUERY( struct CtConnectionTable *ct, FILE *fp) 
{ 

static Split core_split[l]; 
CtAtom *atom; 
int i; 

int k__atomicid - 19; 
int na_atomicid =11; 
int Kid, Naid; 
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int err = 0; 
int *atomMask; 
int *bl, *b2; 
int hevCnt; 

struct CtConnectionTable *dupct; 
Frag *fl, *£2; 

Kid = Naid = -1; 

for ( atom = ct-> atoms, i = 0; i < ct- > atomCount; i-bH- , atom-h + ) 

{ 

if ( atom- > class ! = CtAtomElement ) 
continue; 

if ( atom->id.atomicNumber = = k_atomicid ) 

{ 

if ( Kid > = 0 ) 

fprintf(stderr,"More than one K atom present in core query. \n"), 

Kid = i; 

atom->id.atomicNumber = CARBON; 

} 

else if ( atom->id.atomicNumber = = na_atomicid ) 
{ 

if ( Naid > = 0 ) 

fprintf(stderr, "More than one Na atom present in core query. \n"), 

Naid = i; 

atom->id.atomicNumber - CARBON; 

} 

} 

if (Kid == -1 ) 
{ 

fprintf(stderr, "No K atom present in the core query. \n" ); 
err = 3; 

} 

if ( Naid = = -1 ) 
{ 

fprintf(stderr,"No Na atom present in the core query An" ); 
err = 4; 

} 

if ( err ) 

return err; 

atom = ct-> atoms + Naid; 
stripCharge(ct, atom, Naid); 

atom = ct-> atoms + Kid; 
stripCharge(ct ? atom, Kid); 
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10 


bl = (int *) malloc(ct- > atomCount * sizeof(int) ); 
b2 = (int *) malloc(ct- > atomCount * sizeof(int) ); 

for ( i = 0; i < ct-> atomCount; i++ ) 

{ 

bl[i] = b2[i] = 1; 

} 

bl[Kid] = -1; /* mark base atom */ 
b2[Naid] = -1; /* mark base atom */ 


memset((char *) core_split, '\0\ sizeof(Split) ); 


g_split2 = (split2 *) 0; 
15 g_split3 = (split3 *) 0; 

g_splitcnt = g_splitalloc = g_split3Cnt = g_split3Alloc = 0; 

atomMask = create AtomMask(ct, q_termFlag, &hevCnt); 
addSplit2(l, bl, b2 ); 

20 core_split- > frags = createUniqFrags(ct- > atomCount, g_split2, g_splitcnt, g_split3, g_split3Cnt, 

O atomMask, 

\Q &(core_split- > numFrags) ) ; 

fU core_split->s2 = g_split2; 

25 ifl core__split- > s2cnt = gsplitcnt; 

+j corejsplit- > bondCount = ct->bondCount; 

core jsplit- > atomCount = ct-> atomCount; 
05 core_split- > atomMask = atomMask; 

30 9 g_split2 = (split2 *) 0; 

4~ g_splitcnt = g_splitalloc = 0; 


35^ 

core_split- > ct = ct; 
SearchForFeatures(core_split) ; 
qmode = 1; 
BuildFrags(core_split); 
40 BuildTopomers(ct, core_split, (Split *) 0); 

qmode = 0; 

if ( core_split- > frags && fp ) 
{ 

45 fl = core_split-> frags; 

f2 = coresplit- > frags + 1; 

dupct = DB_CT_UTL_DUP_CT(fl->ct, CtCopyKeepAllAttrs ); 
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atom = dupct-> atoms + Kid; 
atom->id.atomicNumber = k_atomicid; 
atom = dupct-> atoms + Naid; 
atom->id.atomicNumber = na_atomicid; 
setAttr(dupct, "CORESIM", "0"); 
setAttr(dupct, "TS_QID", "0"); 
DB_CT_WRITE(fp, dupct); 
DB_CT_DELETE_CT(dupct) ; 

dupct = DB_CT_UTL_DUP_CT(f2->ct, CtCopyKeepAllAttrs ); 

atom = dupct- > atoms + Kid; 

atom->id.atomicNumber = k_atomicid; 

atom = dupct- > atoms + Naid; 

atom->id.atomicNumber = na_atomicid; 

setAttr(dupct, "CORESIM", "0"); 

setAttr(dupct, "TS_QID", "0"); 

DB_CT_WRITE(fp, dupct); 

DB_CT_DELETE_CT(dupct) ; 

UTLJERRORCLE AR() ; 


qs = core_split; 
return 0; 

} 


top_result *TOP_CORE_SEARCH(struct CtConnectionTable *ct, double radius, double max^attachpen, 

int *r_hascore ) 

{ 

Split *S; 

double fl, £2, f3, f4; 
double si, s2, s3, s4; 
double al, a2, a3, a4; 
Frag *ql, *q2; 
Frag *fsl, *fs2; 
split3 *ss3; 

double sval, sval2, sval3, sval4; 

int i j; 

double best; 

double bestAttach; 

static top_result res[l]; 

Frag *bestFrag, *altFrag; 

int idx = 0; 

CtAtom *atom, *atm2; 

char value[80]; 

struct CtConnectionTable *dupct; 
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int uniqld, hitld; 

memset((char *) res., '\0\ sizeof(top_result) ); 
q_bailout = radius * radius; 

5 : * 

max_attachpen *— maxattachpen; 

best = 999.9 * 999.9; ; 
bestAttach = maxattachpen; 
10 q_coremode =1; 

S = FindBreakPoints(ct, qjninatoms, qjermFlag, TRUE ); 

*r Jiascore = 0; 

if ( !S || S->s3cnt == 0) 

{ 

15 q_coremode = 0; 

if(S) 

freeSplit(S); 
return (topresult *) 0; 

} 

2(L=i *r_hascore = S->s3cnt; 

S S->ct = ct; , I 

iiTi SearchForFeatures(S);- , 

la BuildFrags(S); I 

% ql = qs-> frags; 

S q2 = qs-> frags + 1; v . 

I ?=; bestFrag = (Frag *) 0; 

30? for (j = 0, ss3 = S->s3; ss3 && j < S->s3cnt; j + + , ss3 + + ) 

5 { 

m fsl = S-> frags + ss3-> tragi; 

p fs2 = S-> frags + ss3->frag2; 

35 if ( fsl-> cords « = (double *) 0 j | 62- > cords = = (double *) 0) 

{ 

continue; 

} 

40 atom = fsl- >ct-> atoms + fsl->copyBaseAtom; 

atm2 = fsl- >ct-> atoms + fs2->copyBaseAtom; 

if (atom->bondCount > 1 1 1 atm2->bondCount > 1 ) 

{ 

fsl-> cords = fs2-> cords = (double *) 0; 
45 continue; 

if ( q_debugfp ) 

261 


DB_CT_\yRITE(q_debugfp,fsl-> ct ); 
DB_CT_WRITE(q_debugfp,fs2->ct ); 
UTL_ERROR_CLEAR0; 

5 > 

al = computeAttachmentPenalty(ql, fsl, q2, fs2 ); 
a2 = computeAttachmentPenalty(q2, fsl, ql, fs2 ); 
a3 - computeAttachmentPenalty(ql, fs2, q2, fsl ); 
10 a4 = computeAttachmentPenalty(q2, fs2, ql, fsl ); 

if ( al > max_attachpen && a2 > maxattachpen && a3 > max_attachpen && a4 > 
maxattachpen ) 

{ 

fsl-> cords — fs2-> cords = (double *) 0; 
15 continue; 

} 

fl = compareFeatures(qs, ql, S, fsl, q2->copyBaseAtom, fs2->copyBaseAtom ); 
f2 = compareFeatiires(qs, q2, S, fsl, ql->copyBaseAtom, fs2->copyBaseAtom ); 
f3 = compareFeatures(qs, ql, S, fs2, q2->copyBaseAtom, fsl->copyBaseAtom ); 
20U f4 = compareFeafures(qs, q2, S, fs2, ql->copyBaseAtom, fsl->copyBaseAtom ); 

3 4 
f S sval = f 1 + al; > 

m sval2 = f2 + a2; 

= "5 sval3 = f3 + a3; 

25J sval4 = f4 + a4;'| 

p if ( sval > q_W*lout && sval2 > qJ> ailout sva13 > Oailout && sval4 > 

i q_bailout ) { 

f i : 

fj fsl- > cords = fs2-> cords = (double *) 0; 

30? continue; 

I ) ! 

□ BuildTopomers(ct, S, (Split *) 0 ); 

35 for (j = 0, ss3 = S->d; ss3 && j < S->s3cnt; j + + , ss3 + + ) 

{ i 

fsl = S->frags | ss3->fragl; 
fs2 = S-> frags 4- ss3->frag2; 

40 if (ft l-> cords == (double *) 0 | | fs2-> cords = = (double *) 0) 

continue; 

al = computeAtt<lchmentPenalty(ql, fsl, q2, fs2 ); 
a2 = computeAttachmentPenalty(q2, fsl, ql, fs2 ); 
45 a3 = computeAttachmentPenalty(ql, fs2, q2, fsl ); 

a4 = computeAtt4chmentPenalty(q2, fs2, ql, fsl ); 

fl = compareFeatures(qs, ql, S, fsl, q2->copyBaseAtom, fs2~ > copyBaseAtom ); 
f2 = compareFeatures(qs, q2, S, fsl, ql-> copyBaseAtom, fs2->copyBaseAtom ); 
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f3 = compareFeatures(qs, ql, S, fs2, q2->copyBaseAtom, fsl->copyBaseAtom ); 
f3 = compareFeaiures(qs ? q2, S, fs2, ql->copyBaseAtom, fsl->copyBaseAtom ); 

i 

sl = topFieldConfpressedDiff(ql->qtf[fsl->regionIdx], fsl->topField, fsl->npoints, 
s2 = topFieldConipressedDiff(q2->qtf[fsl->regionIdx], fsl->topField, fsl->npoints, 


s3 = topFieldCompressedDiff(ql->qtf[fs2->regionIdx], fs2->topField, fs2->npoints, 
S 4 = topFieldCompressedDiff(q2- > qtf[fs2- > regionldx], fs2- > topField, fs2- > npoints, 

sval = fl + al sl; 

if ( sval < best && al < maxattachpen ) 

{ 

best = sval; 
res->hexDiffs[0] = sl; 
res->featureDiffs[0] = fl; 
res->attachmentPenalty = al; 
bestFrag = fsl; 
altFrag =ffs2; 
idx = 0; . 

} ! 

sval = f2 + a2 + s2; 

if ( sval < best <S&fc a2 < max_attachpen ) 

{ 

best = sval; 
res->hexDiffs[0] = s2; 
res->featureDiffs[0] = £2; 
res->attachmentPenalty = a2; 
bestFrag = fsl; 
altFrag = fs2; 
idx = 1; f 

} * 

sval = f3 + a3 -t s3; 

if ( sval < best <$& a3 < max attachpen ) 

res->hexi)iffs[0] = s3; 
res->featureDiffs[0] = f3; 
res->attaphnientPenalty = a3; 
bestFrag =? fs2; 
altFrag = ifsl; 
idx = 0; I 

} \ 
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? 

sval = f 4 + a4 -£ s4; 

if ( sval < best &i& a4 < max_attachpen ) 

{ . f 

best = sval; 

5 res->hexDiffs[0] = s4; 

res->featiireDiffs[0] = f4; 
res->attaehmentPenalty = a4; 
bestFrag == fs2; 
altFrag = fsl; 
10 idx = 1; 

» 1 ! 

if ( best < q_bailout ) 

{ 

15 if ( best < 0.0 ) * 

best = 0.6; 
res- > comfadiff '— sqrt(best); 
sprintf(value,"%d", (int) res- > comfa diff ); 
setAttr(bestFrag-$ct, "CORESIM", value ); 

*f sprintf (value, " % d \ , (int) sqrt(res- > attachmentPenalty) ) ; 

I setAttr(bestFrag- ^ ct, "TS_ATT ACH_PEN " , value ); 

!= sprintf(value,"%4', (int) sqrt(res->featureDiffs[0]) ); 

251= setAttr(bestFrag- :> ct, "TS FEATURE " , value ); 

5 sprintf(value, " % d" , (int) sqrt(res- > hexDiffs[0]) ); 
!" setAttr(bestFrag- > ct, "TS_STERIC " , value ); 

30]i sprintf(value,"%d", idx+1 ); 

□ setAttr(bestFrag->ct, "TS_QID", value ); 

6 res- > strFrags[0] ,= DB_CT_UTL_DUP_CT(S- > ct, CtCopyKeepAll Attrs ); 

II res-> strFrags[l] = DB_CT_UTL_DUP_CT(bestFrag- > ct, CtCopyKeepAllAttrs ); 
35 dupct = res->strFrags[l]; 

if ( idx ==1)| 
{ 

atom = dtipct- > atoms 4- bestFrag- > copyBaseAtom; 
40 atom->idlatomicNumber = 11; 

stripCharge(dupct, atom, bestFrag- > copyBaseAtom ); 
atom = dupct- > atoms + altFrag- > copyBaseAtom; 
atom->id.atomicNumber = 19; 


45 } 

else 
{ 


stripCharge(dupct, atom, altFrag- > copyBaseAtom); 

atom = dupct- > atoms + bestFrag- > copyBaseAtom; 

264 


I 

atom->idlatomicNumber = 19; 
stripCharge(dupct, atom, bestFrag->copyBaseAtom); 
atom = diipct- > atoms 4- altFrag->copyBaseAtom; 
atom->id.atomicNumber = 11; 
5 stripCharge(dupct, atom, altFrag->copyBaseAtom); 

dupCheckCore(dupct, &uniqld, &hitld ); 

sprintf(value/%d", uniqld); 
10 setAttr(dupct,'TSJJNIQJD\ value ); 

f 

sprintfCvalue/^d", hitld); 
setAttr(dupct,"TSlHIT_ID M , value ); 

15 freeSplit(S); \ 

q_coremode = 0;v; 
return res; I 

) "i 

20u» q_coremode = 0; * 

;S freeSplit(S); * 

5 return (top _result *) 0; 

1 } 

25 i^ static void stripCharge(struct CtConnectionTable *ct, CtAtom *aptr, int atomidx) 

I { 

gg int relop, charge; 

fi if ( aptr-> attributeMask & CtAtomFormalCharge ) 

3Q| { 

6 charge = 0; 

f y if (DB _CT_GET_ANY_ATOM_ATTR(ct, atomidx 4- 1 , CtAtomFormalCharge, &charge ? 

O &relop ) ) 

c { 

35 if ( charge > 0 ) 

DB_CT_UTL_SUB_ANY_ATOM_ATTR(ct, atomidx+ 1, 
CtAtomFormalCharge ); \ 
} 

UTL_ERROR_CLEAR0; 

40 } 
} 

static int dupCheckCore(struct CtConnectionTable *ct, int *r_uniqid, int *r_hitid ) 

{ 

45 static UniqSln *uniqSlns; 

static int uniqAlloc; ^ 
static int uniqCnt; '\ 
UniqSln *uptr; 
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30? ); 


int i; 

struct CtConnectionTable *dupct; 
char *sln; 
unsigned int crc; 

dupct = DB CT UTL DUP_CT(ct, QCopyKeepAttrs ); 
DB_CT_UNIQ(dupct); 

sin = DB_CT_SLN_GENERATE_NOATTR(dupct, (int **) 0); 
crc = DB_CT_HOLO_GBN_CRC(sln); 

DB_CT_DELETE_CT(dupct); 


for ( i = 0, uptr = uniqSlns; i < uniqCnt; i+ +, uptr + + ) 
{ 

15 if ( uptr- > crc = = crc && !strcmp(uptr->sln ? sin ) ) 

{ 

uptr->hitcnt++; 
*r_uniqid = i+1; 
*r_hitid = uptr->hitcnt; 
2CL UTL_MEM_FREE(sln); 
^: return uptr- > hitcnt; 

X } ; 

Si > 

IZ if ( uniqCnt > = uniqAlloc ) 

2$U { 

U if ( uniqSlns ) [ 

8 { * 

uniqAlloc *= 2; 

f~ uniqSlns M (UniqSln *) realloc((char *) uniqSlns, uniqAlloc * sizeof(UniqSln) 


} 

else 

{ 


kg, uniqAlloc =100; 

3$ uniqSlns = (UniqSln *) malloc(uniqAlloc * sizeof(UniqSln) ); 

uptr = uniqSlns + uniqCnt; 
uptr- > sin = sin; ; 
40 uptr- > crc = crc; 

uptr- > hitcnt = 1; $ 
uniqCntH- + ; 

; > 

*r_uniqid = uniqCnt; * 
45 *r_hitid = uptr- > hitcnt; 


} 


return 0; 
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int *TOP_MATRIX_SEARCH(char **slns, int numSlns ) 
{ 

int i j; 

int *matrix; 
5 int offset; 

struct CtConnectionTable *ct; 

struct CtConnectionTable *largest; 

Split **splits; 

Split *S; 
10 Split *QS; 

double *cord; 

int natoms; 

Frag *iptr; 

double comfa diff; 
15 double radius; 

int nParts; 

int idx; 

int modified; 

int junk; 
2(k double junk2; 

S int qidx, sidx, splitidx, splitlnThree; 

fy double best2; 

\n double best3; 

25>g double attachPen; 

h int bailedout = 0; 
fg int tfrags = 0; 


3Qg matrix = (int *) malloc( numSlns * numSlns * sizeof(int) ); 

p splits = (Split **) calloc(numSlns, sizeof(Split *) ); 

O radius = 2000.0; 

M= q_bailout = radius * radius; /* just force it very high */ 

35 #if0 

q_minatoms = 3; 
q_termFlag = 1; 

#endif 

(LmatrixMode =1; 
40 TOP_STER_REGION_MODE(2); 

for ( i = 0; i < numSlns; i+ + ) 

{ 

fprintf(stderr, H initializing %d for matrix total Frags: %d\n", i+1, tfrags); 
45 ct = DB_IMPORT_SLN(slns[i]); 

if ( let ) 

{ 

UTL ERRORCLEAR0; 
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splits[i] = (Split *) 0; 
continue; 

} 

cord = (double *) 0; 

DB_CT_GET_CT_ATTR(ct, CtCt3DCoordSet, &cord, &natoms ); 
if ( !cord ) 

{ 1 

DB_CT_DELETE_CT(ct); 
splits[i] = (Split *) 0; 
continue; 

} 

DB_CT_UTL_COUNT_FRAGS(ct, 0, (int *) 0, 0, (int *) 0, &nParts ); 

if ( nParts > 1 ) 

{ 

largest = getLargestFrag(ct); 
DBCTDELETECT(ct); 
ct = largest; 

} 

DB_CT_NORM_AROM(ct); 
DB_CT_STANDARD(ct, &modified); 
DB_CT_UTL_FIND_RINGS (ct); 
UTLERRORCLEARO; 

S = FindBreakPoints(ct, q_minatoms, q_termFlag, TRUE ); 
if ( q_termFlag ) ■■ 

j = q_minatoms - 1; 

else 

j = q_minatoms; 

while ( (!S 1 1 S->s2cnt == 0 )&&j >= 3) 
{ 

if(S) 

freeSplit(S); 
S = FindBreakPoints(ct, j, 0, TRUE ); 
qjninatoms = j; 
j-S 

} 

if (S&&S->s2cnt = =0) 

< \ 
freeSplit(S); 

S = (SpliS *) 0; 

} 

splits[i] = S; 
if ( !S ) :l 
continue; r , 
tfrags += S->numFrags; 
S->ct = ct; 
SearchForFeatures(S); 
BuildFrags(S); 
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BuildTopomers(cti S, (Split *) 0); 

for ( j = 0, fptr = S->frags; j < S->numFrags; j + + , fptr+ + ) 
{ 

fptr->qtffO] = fptr->topField; 
freeFragCts(S); / 

} 

iprintf(stderr, "Finished initializing for matrix\n M ); 

i 

for ( i = 0; i < numSlns* i+ + ) 

{ 

QS = splitsfi]; 
qs = QS; 

for (j = 0; j < numSlns; j++ ) 

{ I 

idx = i*numSlns + j; 

if(i=='j) 
{ 

matnx[idx] = 0; 
continue; 

S = splitsQ]; 
if(!QS |] IS) 
{ 

if X !QS && !S ) 

matrix[idx] = 0; /* both don't have coordinates */ 

elsje 

'] matrix[idx] = 5000; /* one of them doesn't */ 
continue; 

. } 

if ( qjfeatureFactor > 0.0 ) 

comfa_diff = CompareAllFeatures(QS ? S, radius); 
comfa diff = CompareTwoCompounds(QS, S, radius, &qidx, &sidx, &splitidx, 
AsplitlnThree, &junk, I 

&best2, &best3, &junk2, &attachPen, bailedout ); 
matrixpdx] = (int) comfa_diff; 

} 

freeStrMap(QS); 

fprintf(stderr, M pass %d completed", i+1 ); 

> . ; 

qjnatnxMode = 0; 
return matrix; 

} 

struct CtConnectionTable *getLargestFrag(struct CtConnectionTable *ct ) 
{ 

struct CtConnectionTable **cts; 
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struct CtConnectionTable *largest; 

int max Atoms; 

int currAtoms; 

int *whichPiece; 

int nParts; 

int idx; 

int *atoms; 

int natoms; ■ 
int i; \ 
int *ordering; 

i 

DB_CT_UTL_SPUT_CT(ct, &nParts, &cts, &whichPiece,(int **) 0); 
largest = cts[0]; 

DB_CT_GET_CT_ATTR(largest, CtCtAtoraCount, &maxAtoras ); 
idx = 1; s 
for (i = 1; i < nParts; i++ ) 
{ 

DB_CT_GET_CT_ATTR(cts [i] , CtCtAtomCount, &currAtoms ); 
if ( currAtoms > maxAtoms ) 

{ 

largest = cts[i]; 
maxAtoms = currAtoms; 
idx = i+1; 

} 

} 

atoms = (int *) calloc(ct- > atomCount, sizeof (int) ); 
for ( natoms = 0, i = 1; i < = ct- > atomCount; i-h-f- ) 
{ 

if ( whichPiece[i] /= = idx ) 

{ 

atoms [natoms] = i; 
natoms 4-4-; 

} 

} 

largest = DB_CT_UTL_COPY_CT(ct, natoms, atoms, &ordering, CtCopyKeepAllAttrs ); 
for ( i = 0; i < nParts; i+ + ) 

DB_CT_DELETE_CT(cts[i]); 
free((char *) atoms ); 

return largest; 
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